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PREFACE 


It is a pleasure to welcome Professor Masoud Salehi as a coauthor to the fifth edition 
of Digital Communications. This new edition has undergone a major revision and 
reorganization of topics, especially in the area of channel coding and decoding. A new 
chapter on multiple-antenna systems has been added as well. 

The book is designed to serve as a text for a first-year graduate-level course for 
students in electrical engineering. It is also designed to serve as a text for self-study 
and as a reference book for the practicing engineer involved in the design and analysis 
of digital communications systems. As to background, we presume that the reader has 
a thorough understanding of basic calculus and elementary linear systems theory and 
prior knowledge of probability and stochastic processes. 

Chapter 1 is an introduction to the subject, including a historical perspective and 
a description of channel characteristics and channel models. 

Chapter 2 contains a review of deterministic and random signal analysis, including 
bandpass and lowpass signal representations, bounds on the tail probabilities of random 
variables, limit theorems for sums of random variables, and random processes. 

Chapter 3 treats digital modulation techniques and the power spectrum of digitally 
modulated signals. 

Chapter 4 is focused on optimum receivers for additive white Gaussian noise 
(AWGN) channels and their error rate performance. Also included in this chapter is 
an introduction to lattices and signal constellations based on lattices, as well as link 
budget analyses for wireline and radio communication systems. 

Chapter 5 is devoted to carrier phase estimation and time synchronization methods 
based on the maximum-likelihood criterion. Both decision-directed and non-decision- 
directed methods are described. 

Chapter 6 provides an introduction to topics in information theory, including 
lossless source coding, lossy data compression, channel capacity for different channel 
models, and the channel reliability function. 

Chapter 7 treats linear block codes and their properties. Included is a treatment 
of cyclic codes, BCH codes, Reed-Solomon codes, and concatenated codes. Both soft 
decision and hard decision decoding methods are described, and their performance in 
AWGN channels is evaluated. 

Chapter 8 provides a treatment of trellis codes and graph-based codes, includ- 
ing convolutional codes, turbo codes, low density parity check (LDPC) codes, trel- 
lis codes for band-limited channels, and codes based on lattices. Decoding algo- 
rithms are also treated, including the Viterbi algorithm and its performance on AWGN 


xvi 
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channels, the BCJR algorithm for iterative decoding of turbo codes, and the sum-product 
algorithm. 

Chapter 9 is focused on digital communication through band-limited channels. 
Topics treated in this chapter include the characterization and signal design for band- 
limited channels, the optimum receiver for channels with intersymbol interference and 
AWGN, and suboptimum equalization methods, namely, linear equalization, decision- 
feedback equalization, and turbo equalization. 

Chapter 10 treats adaptive channel equalization. The LMS and recursive least- 
squares algorithms are described together with their performance characteristics. This 
chapter also includes a treatment of blind equalization algorithms. 

Chapter 11 provides a treatment of multichannel and multicarrier modulation. 
Topics treated include the error rate performance of multichannel binary signal and 
M - ary orthogonal signals in AWGN channels; the capacity of a nonideal linear filter 
channel with AWGN; OFDM modulation and demodulation; bit and power alloca- 
tion in an OFDM system; and methods to reduce the peak-to-average power ratio in 
OFDM. 

Chapter 12 is focused on spread spectrum signals and systems, with emphasis 
on direct sequence and frequency-hopped spread spectrum systems and their perfor- 
mance. The benefits of coding in the design of spread spectrum signals is emphasized 
throughout this chapter. 

Chapter 13 treats communication through fading channels, including the charac- 
terization of fading channels and the key important parameters of multipath spread and 
Doppler spread. Several channel fading statistical models are introduced, with empha- 
sis placed on Rayleigh fading, Ricean fading, and Nakagami fading. An analysis of the 
performance degradation caused by Doppler spread in an OFDM system is presented, 
and a method for reducing this performance degradation is described. 

Chapter 14 is focused on capacity and code design for fading channels. After intro- 
ducing ergodic and outage capacities, coding for fading channels is studied. Bandwidth- 
efficient coding and bit-interleaved coded modulation are treated, and the performance 
of coded systems in Rayleigh and Ricean fading is derived. 

Chapter 15 provides a treatment of multiple-antenna systems, generally called 
multiple-input, multiple-output (MIMO) systems, which are designed to yield spatial 
signal diversity and spatial multiplexing. Topics treated in this chapter include detection 
algorithms for MIMO channels, the capacity of MIMO channels with AWGN without 
and with signal fading, and space-time coding. 

Chapter 16 treats multiuser communications, including the topics of the capacity 
of multiple-access methods, multiuser detection methods for the uplink in CDMA 
systems, interference mitigation in multiuser broadcast channels, and random access 
methods such as ALOHA and carrier-sense multiple access (CSMA). 

With 16 chapters and a variety of topics, the instructor has the flexibility to design 
either a one- or two-semester course. Chapters 3, 4, and 5 provide a basic treatment of 
digital modulation/demodulation and detection methods. Channel coding and decoding 
treated in Chapters 7, 8, and 9 can be included along with modulation/demodulation 
in a one-semester course. Alternatively, Chapters 9 through 12 can be covered in place 
of channel coding and decoding. A second semester course can cover the topics of 
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Introduction 


in this book, we present the basic principles that underlie the analysis and design 
of digital communication systems. The subject of digital communications involves the 
transmission of information in digital form from a source that generates the information 
to one or more destinations. Of particular importance in the analysis and design of 
communication systems are the characteristics of the physical channels through which 
the information is transmitted. The characteristics of the channel generally affect the 
design of the basic building blocks of the communication system. Below, we describe 
the elements of a communication system and their functions. 


■ 1.1 

ELEMENTS OF A DIGITAL COMMUNICATION SYSTEM 

Figure 1.1-1 illustrates the functional diagram and the basic elements of a digital 
communication system. The source output may be either an analog signal, such as an 
audio or video signal, or a digital signal, such as the output of a computer, that is discrete 
in time and has a finite number of output characters. In a digital communication system, 
the messages produced by the source are converted into a sequence of binary digits. 
Ideally, we should like to represent the source output (message) by as few binary digits 
as possible. In other words, we seek an efficient representation of the source output 
that results in little or no redundancy. The process of efficiently converting the output 
of either an analog or digital source into a sequence of binary digits is called source 
encoding or data compression. 

The sequence of binary digits from the source encoder, which we call the informa- 
tion sequence, is passed to the channel encoder. The purpose of the channel encoder 
is to introduce, in a controlled manner, some redundancy in the binary information 
sequence that can be used at the receiver to overcome the effects of noise and inter- 
ference encountered in the transmission of the signal through the channel. Thus, the 
added redundancy serves to increase the reliability of the received data and improves 
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FIGURE 1.1-1 

Basic elements of a digital communication system. 


the fidelity of the received signal. In effect, redundancy in the information sequence 
aids the receiver in decoding the desired information sequence. For example, a (trivial) 
form of encoding of the binary information sequence is simply to repeat each binary 
digit m times, where m is some positive integer. More sophisticated (nontrivial) encod- 
ing involves taking k information bits at a time and mapping each A-bit sequence into 
a unique n-bit sequence, called a code word. The amount of redundancy introduced by 
encoding the data in this manner is measured by the ratio n/k. The reciprocal of this 
ratio, namely k/n, is called the rate of the code or, simply, the code rate. 

The binary sequence at the output of the channel encoder is passed to the digital 
modulator, which serves as the interface to the communication channel. Since nearly 
all the communication channels encountered in practice are capable of transmitting 
electrical signals (waveforms), the primary purpose of the digital modulator is to map 
the binary information sequence into signal waveforms. To elaborate on this point, let 
us suppose that the coded information sequence is to be transmitted one bit at a time at 
some uniform rate R bits per second (bits/s). The digital modulator may simply map the 
binary digit 0 into a waveform so(t) and the binary digit 1 into a waveform ,V| (t). In this 
manner, each bit from the channel encoder is transmitted separately. We call this binary 
modulation. Alternatively, the modulator may transmit b coded information bits at a 
time by using M = 2 h distinct waveforms ,s', (r), i = 0. \ ..... M — I , one waveform 
for each of the 2 b possible b - bit sequences. We call this M-ary modulation (M > 2). 
Note that a new fi-bit sequence enters the modulator every b/R seconds. Hence, when 
the channel bit rate R is fixed, the amount of time available to transmit one of the M 
waveforms corresponding to a fi-bit sequence is b times the time period in a system 
that uses binary modulation. 

The communication channel is the physical medium that is used to send the signal 
from the transmitter to the receiver. In wireless transmission, the channel may be the 
atmosphere (free space). On the other hand, telephone channels usually employ a variety 
of physical media, including wire lines, optical fiber cables, and wireless (microwave 
radio). Whatever the physical medium used for transmission of the information, the 
essential feature is that the transmitted signal is corrupted in a random manner by a 
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variety of possible mechanisms, such as additive thermal noise generated by electronic 
devices; man-made noise, e.g., automobile ignition noise; and atmospheric noise, e.g., 
electrical lightning discharges during thunderstorms. 

At the receiving end of a digital communication system, the digital demodulator 
processes the channel-corrupted transmitted waveform and reduces the waveforms to 
a sequence of numbers that represent estimates of the transmitted data symbols (binary 
or M- ary). This sequence of numbers is passed to the channel decoder, which attempts 
to reconstruct the original information sequence from knowledge of the code used by 
the channel encoder and the redundancy contained in the received data. 

A measure of how well the demodulator and decoder perform is the frequency with 
which errors occur in the decoded sequence. More precisely, the average probability 
of a bit-error at the output of the decoder is a measure of the performance of the 
demodulator-decoder combination. In general, the probability of error is a function of 
the code characteristics, the types of waveforms used to transmit the information over 
the channel, the transmitter power, the characteristics of the channel (i.e., the amount 
of noise, the nature of the interference), and the method of demodulation and decoding. 
These items and their effect on performance will be discussed in detail in subsequent 
chapters. 

As a final step, when an analog output is desired, the source decoder accepts the 
output sequence from the channel decoder and, from knowledge of the source encoding 
method used, attempts to reconstruct the original signal from the source. Because of 
channel decoding errors and possible distortion introduced by the source encoder, 
and perhaps, the source decoder, the signal at the output of the source decoder is an 
approximation to the original source output. The difference or some function of the 
difference between the original signal and the reconstructed signal is a measure of the 
distortion introduced by the digital communication system. 


■ 1.2 

COMMUNICATION CHANNELS AND THEIR CHARACTERISTICS 

As indicated in the preceding discussion, the communication channel provides the con- 
nection between the transmitter and the receiver. The physical channel may be a pair of 
wires that carry the electrical signal, or an optical fiber that carries the information on a 
modulated light beam, or an underwater ocean channel in which the information is trans- 
mitted acoustically, or free space over which the information-bearing signal is radiated 
by use of an antenna. Other media that can be characterized as communication channels 
are data storage media, such as magnetic tape, magnetic disks, and optical disks. 

One common problem in signal transmission through any channel is additive noise. 
In general, additive noise is generated internally by components such as resistors and 
solid-state devices used to implement the communication system. This is sometimes 
called thermal noise. Other sources of noise and interference may arise externally to 
the system, such as interference from other users of the channel. When such noise 
and interference occupy the same frequency band as the desired signal, their effect 
can be minimized by the proper design of the transmitted signal and its demodulator at 
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the receiver. Other types of signal degradations that may be encountered in transmission 
over the channel are signal attenuation, amplitude and phase distortion, and multipath 
distortion. 

The effects of noise may be minimized by increasing the power in the transmitted 
signal. However, equipment and other practical constraints limit the power level in 
the transmitted signal. Another basic limitation is the available channel bandwidth. 
A bandwidth constraint is usually due to the physical limitations of the medium and 
the electronic components used to implement the transmitter and the receiver. These 
two limitations constrain the amount of data that can be transmitted reliably over any 
communication channel as we shall observe in later chapters. Below, we describe some 
of the important characteristics of several communication channels. 

Wireline Channels 

The telephone network makes extensive use of wire lines for voice signal transmission, 
as well as data and video transmission. Twisted-pair wire lines and coaxial cable are 
basically guided electromagnetic channels that provide relatively modest bandwidths. 
Telephone wire generally used to connect a customer to a central office has a bandwidth 
of several hundred kilohertz (kHz). On the other hand, coaxial cable has a usable 
bandwidth of several megahertz (MHz). Figure 1.2-1 illustrates the frequency range of 
guided electromagnetic channels, which include waveguides and optical fibers. 

Signals transmitted through such channels are distorted in both amplitude and 
phase and further corrupted by additive noise. Twisted-pair wireline channels are also 
prone to crosstalk interference from physically adjacent channels. Because wireline 
channels carry a large percentage of our daily communications around the country and 
the world, much research has been performed on the characterization of their trans- 
mission properties and on methods for mitigating the amplitude and phase distortion 
encountered in signal transmission. In Chapter 9, we describe methods for designing 
optimum transmitted signals and their demodulation; in Chapter 10, we consider the 
design of channel equalizers that compensate for amplitude and phase distortion on 
these channels. 

Fiber-Optic Channels 

Optical fibers offer the communication system designer a channel bandwidth that is 
several orders of magnitude larger than coaxial cable channels. During the past two 
decades, optical fiber cables have been developed that have a relatively low signal atten- 
uation, and highly reliable photonic devices have been developed for signal generation 
and signal detection. These technological advances have resulted in a rapid deploy- 
ment of optical fiber channels, both in domestic telecommunication systems as well as 
for transcontinental communication. With the large bandwidth available on fiber-optic 
channels, it is possible for telephone companies to offer subscribers a wide array of 
telecommunication services, including voice, data, facsimile, and video. 

The transmitter or modulator in a fiber-optic communication system is a light 
source, either a light-emitting diode (LED) or a laser. Information is transmitted by 
varying (modulating) the intensity of the light source with the message signal. The light 
propagates through the fiber as a light wave and is amplified periodically (in the case of 
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digital transmission, it is detected and regenerated by repeaters) along the transmission 
path to compensate for signal attenuation. At the receiver, the light intensity is detected 
by a photodiode, whose output is an electrical signal that varies in direct proportion 
to the power of the light impinging on the photodiode. Sources of noise in fiber-optic 
channels are photodiodes and electronic amplifiers. 


Wireless Electromagnetic Channels 

In wireless communication systems, electromagnetic energy is coupled to the prop- 
agation medium by an antenna which serves as the radiator. The physical size and 
the configuration of the antenna depend primarily on the frequency of operation. To 
obtain efficient radiation of electromagnetic energy, the antenna must be longer than 


6 


Digital Communications 


jq of the wavelength. Consequently, a radio station transmitting in the amplitude- 
modulated (AM) frequency band, say at f c = 1 MHz [corresponding to a wavelength 
of X = c/ f c = 300 meters (m)], requires an antenna of at least 30 m. Other important 
characteristics and attributes of antennas for wireless transmission are described in 
Chapter 4. 

Figure 1 .2-2 illustrates the various frequency bands of the electromagnetic spec- 
trum. The mode of propagation of electromagnetic waves in the atmosphere and in 
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Frequency range for wireless electromagnetic channels. [Adapted from Carlson (1975), 2nd 
edition, © McGraw-Hill Book Company Co. Reprinted with permission of the publisher.] 
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free space may be subdivided into three categories, namely, ground-wave propagation, 
sky-wave propagation, and line-of-sight (LOS) propagation. In the very low frequency 
(VLF) and audio frequency bands, where the wavelengths exceed 10 km, the earth 
and the ionosphere act as a waveguide for electromagnetic wave propagation. In these 
frequency ranges, communication signals practically propagate around the globe. For 
this reason, these frequency bands are primarily used to provide navigational aids from 
shore to ships around the world. The channel bandwidths available in these frequency 
bands are relatively small (usually 1-10 percent of the center frequency), and hence the 
information that is transmitted through these channels is of relatively slow speed and 
generally confined to digital transmission. A dominant type of noise at these frequen- 
cies is generated from thunderstorm activity around the globe, especially in tropical 
regions. Interference results from the many users of these frequency bands. 

Ground- wave propagation, as illustrated in Figure 1.2-3, is the dominant mode of 
propagation for frequencies in the medium frequency (MF) band (0.3-3 MHz). This is 
the frequency band used for AM broadcasting and maritime radio broadcasting. In AM 
broadcasting, the range with ground- wave propagation of even the more powerful radio 
stations is limited to about 150 km. Atmospheric noise, man-made noise, and thermal 
noise from electronic components at the receiver are dominant disturbances for signal 
transmission in the MF band. 

Sky-wave propagation, as illustrated in Figure 1.2^1, results from transmitted sig- 
nals being reflected (bent or refracted) from the ionosphere, which consists of several 
layers of charged particles ranging in altitude from 50 to 400 km above the surface of 
the earth. During the daytime hours, the heating of the lower atmosphere by the sun 
causes the formation of the lower layers at altitudes below 120 km. These lower layers, 
especially the D-layer, serve to absorb frequencies below 2 MHz, thus severely limiting 
sky-wave propagation of AM radio broadcast. However, during the nighttime hours, the 
electron density in the lower layers of the ionosphere drops sharply and the frequency 
absorption that occurs during the daytime is significantly reduced. As a consequence, 
powerful AM radio broadcast stations can propagate over large distances via sky wave 
over the F-layer of the ionosphere, which ranges from 140 to 400 km above the surface 
of the earth. 



FIGURE 1.2-4 

Illustration of sky-wave propagation. 


Digital Communications 


A frequently occurring problem with electromagnetic wave propagation via sky 
wave in the high frequency (HF) range is signal multipath. Signal multipath occurs 
when the transmitted signal arrives at the receiver via multiple propagation paths at dif- 
ferent delays. It generally results in intersymbol interference in a digital communication 
system. Moreover, the signal components arriving via different propagation paths may 
add destructively, resulting in a phenomenon called signal fading, which most people 
have experienced when listening to a distant radio station at night when sky wave is 
the dominant propagation mode. Additive noise in the HF range is a combination of 
atmospheric noise and thermal noise. 

Sky-wave ionospheric propagation ceases to exist at frequencies above approx- 
imately 30 MHz, which is the end of the HF band. However, it is possible to have 
ionospheric scatter propagation at frequencies in the range 30-60 MHz, resulting from 
signal scattering from the lower ionosphere. It is also possible to communicate over 
distances of several hundred miles by use of tropospheric scattering at frequencies in 
the range 40-300 MHz. Troposcatter results from signal scattering due to particles 
in the atmosphere at altitudes of 10 miles or less. Generally, ionospheric scatter and 
tropospheric scatter involve large signal propagation losses and require a large amount 
of transmitter power and relatively large antennas. 

Frequencies above 30 MHz propagate through the ionosphere with relatively little 
loss and make satellite and extraterrestrial communications possible. Hence, at fre- 
quencies in the very high frequency (VHF) band and higher, the dominant mode of 
electromagnetic propagation is LOS propagation. For terrestrial communication sys- 
tems, this means that the transmitter and receiver antennas must be in direct LOS with 
relatively little or no obstruction. For this reason, television stations transmitting in the 
VHF and ultra high frequency (UHF) bands mount their antennas on high towers to 
achieve a broad coverage area. 

In general, the coverage area for LOS propagation is limited by the curvature of 
the earth. If the transmitting antenna is mounted at a height h m above the surface of 
the earth, the distance to the radio horizon, assuming no physical obstructions such 
as mountains, is approximately d = f 15 h km. For example, a television antenna 
mounted on a tower of 300 m in height provides a coverage of approximately 67 km. 
As another example, microwave radio relay systems used extensively for telephone and 
video transmission at frequencies above 1 gigahertz (GHz) have antennas mounted on 
tall towers or on the top of tall buildings. 

The dominant noise limiting the performance of a communication system in VHF 
and UHF ranges is thermal noise generated in the receiver front end and cosmic noise 
picked up by the antenna. At frequencies in the super high frequency (SHF) band above 
10 GHz, atmospheric conditions play a major role in signal propagation. For example, 
at 10 GHz, the attenuation ranges from about 0.003 decibel per kilometer (dB/km) in 
light rain to about 0.3 dB/km in heavy rain. At 100 GHz, the attenuation ranges from 
about 0. 1 dB/km in light rain to about 6 dB/km in heavy rain. Hence, in this frequency 
range, heavy rain introduces extremely high propagation losses that can result in service 
outages (total breakdown in the communication system). 

At frequencies above the extremely high frequency (EHF) band, we have the in- 
frared and visible light regions of the electromagnetic spectrum, which can be used 
to provide LOS optical communication in free space. To date, these frequency bands 
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have been used in experimental communication systems, such as satellite-to-satellite 
links. 

Underwater Acoustic Channels 

Over the past few decades, ocean exploration activity has been steadily increasing. 
Coupled with this increase is the need to transmit data, collected by sensors placed 
under water, to the surface of the ocean. From there, it is possible to relay the data via 
a satellite to a data collection center. 

Electromagnetic waves do not propagate over long distances under water except at 
extremely low frequencies. However, the transmission of signals at such low frequencies 
is prohibitively expensive because of the large and powerful transmitters required. The 
attenuation of electromagnetic waves in water can be expressed in terms of the skin 
depth, which is the distance a signal is attenuated by 1 /e. For seawater, the skin depth 
<5 = 250/V7, where / is expressed in Hz and 5 is in m. For example, at 10 kHz, the 
skin depth is 2.5 m. In contrast, acoustic signals propagate over distances of tens and 
even hundreds of kilometers. 

An underwater acoustic channel is characterized as a multipath channel due to 
signal reflections from the surface and the bottom of the sea. Because of wave mo- 
tion, the signal multipath components undergo time-varying propagation delays that 
result in signal fading. In addition, there is frequency-dependent attenuation, which is 
approximately proportional to the square of the signal frequency. The sound velocity 
is nominally about 1500 m/s, but the actual value will vary either above or below the 
nominal value depending on the depth at which the signal propagates. 

Ambient ocean acoustic noise is caused by shrimp, fish, and various mammals. 
Near harbors, there is also man-made acoustic noise in addition to the ambient noise. 
In spite of this hostile environment, it is possible to design and implement efficient and 
highly reliable underwater acoustic communication systems for transmitting digital 
signals over large distances. 

Storage Channels 

Information storage and retrieval systems constitute a very significant part of data- 
handling activities on a daily basis. Magnetic tape, including digital audiotape and 
videotape, magnetic disks used for storing large amounts of computer data, optical 
disks used for computer data storage, and compact disks are examples of data storage 
systems that can be characterized as communication channels. The process of storing 
data on a magnetic tape or a magnetic or optical disk is equivalent to transmitting 
a signal over a telephone or a radio channel. The readback process and the signal 
processing involved in storage systems to recover the stored information are equivalent 
to the functions performed by a receiver in a telephone or radio communication system 
to recover the transmitted information. 

Additive noise generated by the electronic components and interference from ad- 
jacent tracks is generally present in the readback signal of a storage system, just as is 
the case in a telephone or a radio communication system. 

The amount of data that can be stored is generally limited by the size of the disk 
or tape and the density (number of bits stored per square inch) that can be achieved by 
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the write/read electronic systems and heads. For example, a packing density of 10 9 bits 
per square inch has been demonstrated in magnetic disk storage systems. The speed at 
which data can be written on a disk or tape and the speed at which it can be read back 
are also limited by the associated mechanical and electrical subsystems that constitute 
an information storage system. 

Channel coding and modulation are essential components of a well-designed digital 
magnetic or optical storage system. In the readback process, the signal is demodulated 
and the added redundancy introduced by the channel encoder is used to correct errors 
in the readback signal. 


■ 1.3 

MATHEMATICAL MODELS FOR COMMUNICATION CHANNELS 

In the design of communication systems for transmitting information through physical 
channels, we find it convenient to construct mathematical models that reflect the most 
important characteristics of the transmission medium. Then, the mathematical model for 
the channel is used in the design of the channel encoder and modulator at the transmitter 
and the demodulator and channel decoder at the receiver. Below, we provide a brief 
description of the channel models that are frequently used to characterize many of the 
physical channels that we encounter in practice. 

The Additive Noise Channel 

The simplest mathematical model for a communication channel is the additive noise 
channel, illustrated in Figure 1 .3-1 . In this model, the transmitted signal s (t) is corrupted 
by an additive random noise process n(t). Physically, the additive noise process may 
arise from electronic components and amplifiers at the receiver of the communication 
system or from interference encountered in transmission (as in the case of radio signal 
transmission). 

If the noise is introduced primarily by electronic components and amplifiers at the 
receiver, it may be characterized as thermal noise. This type of noise is characterized 
statistically as a Gaussian noise process. Hence, the resulting mathematical model 
for the channel is usually called the additive Gaussian noise channel. Because this 
channel model applies to a broad class of physical communication channels and because 
of its mathematical tractability, this is the predominant channel model used in our 
communication system analysis and design. Channel attenuation is easily incorporated 
into the model. When the signal undergoes attenuation in transmission through the 



FIGURE 1.3-1 

The additive noise channel. 


r(t) = s(t) + n(t) 
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The linear filter channel with 
additive noise. 


channel, the received signal is 

r(t ) = as(t ) + n(t) (1.3-1) 

where a is the attenuation factor. 


The Linear Filter Channel 

In some physical channels, such as wireline telephone channels, filters are used to en- 
sure that the transmitted signals do not exceed specified bandwidth limitations and thus 
do not interfere with one another. Such channels are generally characterized mathemat- 
ically as linear filter channels with additive noise, as illustrated in Figure 1.3-2. Hence, 
if the channel input is the signal sit), the channel output is the signal 


r{t) = sit) ★ c(t) + n(t ) 

/ OO 

c(r)s(t — t) dr + n{t) 

-OO 

where c{t) is the impulse response of the linear filter and ★ denotes convolution. 


(1.3-2) 


The Linear Time-Variant Filter Channel 

Physical channels such as underwater acoustic channels and ionospheric radio chan- 
nels that result in time- variant multipath propagation of the transmitted signal may be 
characterized mathematically as time-variant linear filters. Such linear filters are charac- 
terized by a time- variant channel impulse response c(r; t), where c(r ; t) is the response 
of the channel at time t due to an impulse applied at time t — r. Thus, r represents the 
“age” (elapsed-time) variable. The linear time- variant filter channel with additive noise 
is illustrated in Figure 1.3-3. For an input signal sit), the channel output signal is 


r(t) = sit) ★ c(t; t) + nit) 
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(1.3-3) 
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A good model for multipath signal propagation through physical channels, such as 
the ionosphere (at frequencies below 30 MHz) and mobile cellular radio channels, is a 
special case of (1.3-3) in which the time-variant impulse response has the form 

L 

c(t; t) = J2 a k(t)8(r - r k ) (1-3-4) 

k= t 

where the {a k (t)} represents the possibly time-variant attenuation factors for the L 
multipath propagation paths and {r^} are the corresponding time delays. If (1.3-4) is 
substituted into (1.3-3), the received signal has the form 

L 

r(t ) = ^2 a k(t)s(t - x k ) + n(t) (1.3-5) 

k= 1 

Hence, the received signal consists of L multipath components, where the Ath compo- 
nent is attenuated by a k {t ) and delayed by x k . 

The three mathematical models described above adequately characterize the great 
majority of the physical channels encountered in practice. These three channel models 
are used in this text for the analysis and design of communication systems. 


■ 1.4 

A HISTORICAL PERSPECTIVE IN THE DEVELOPMENT 
OF DIGITAL COMMUNICATIONS 

It is remarkable that the earliest form of electrical communication, namely telegraphy, 
was a digital communication system. The electric telegraph was developed by Samuel 
Morse and was demonstrated in 1837. Morse devised the variable-length binary code 
in which letters of the English alphabet are represented by a sequence of dots and 
dashes (code words). In this code, more frequently occurring letters are represented by 
short code words, while letters occurring less frequently are represented by longer code 
words. Thus, the Morse code was the precursor of the variable-length source coding 
methods described in Chapter 6. 

Nearly 40 years later, in 1 875, Emile Baudot devised a code for telegraphy in which 
every letter was encoded into fixed-length binary code words of length 5. In the Baudot 
code, binary code elements are of equal length and designated as mark and space. 

Although Morse is responsible for the development of the first electrical digital 
communication system (telegraphy), the beginnings of what we now regard as modern 
digital communications stem from the work of Nyquist (1924), who investigated the 
problem of determining the maximum signaling rate that can be used over a telegraph 
channel of a given bandwidth without intersymbol interference. He formulated a model 
of a telegraph system in which a transmitted signal has the general form 


s(t) = ^ a„g(t - nT ) 


(1-4-1) 


Chapter One: Introduction 


13 


where g(t) represents a basic pulse shape and { a n } is the binary data sequence of {±1} 
transmitted at a rate of 1 /T bits/s. Nyquist set out to determine the optimum pulse shape 
that was band- limited to W Hz and maximized the bit rate under the constraint that the 
pulse caused no intersymbol interference at the sampling time k/T, k = 0, ±1, ±2, 

His studies led him to conclude that the maximum pulse rate is 2 W pulses/s. This rate 
is now called the Nyquist rate. Moreover, this pulse rate can be achieved by using 
the pulses g(t) = (sin27tWt)/2jTWt. This pulse shape allows recovery of the data 
without intersymbol interference at the sampling instants. Nyquist’s result is equivalent 
to a version of the sampling theorem for band-limited signals, which was later stated 
precisely by Shannon (1948b). The sampling theorem states that a signal of bandwidth 
W can be reconstructed from samples taken at the Nyquist rate of 2 IT samples/s using 
the interpolation formula 


In light of Nyquist’s work, Hartley (1928) considered the issue of the amount 
of data that can be transmitted reliably over a band-limited channel when multiple 
amplitude levels are used. Because of the presence of noise and other interference, 
Hartley postulated that the receiver can reliably estimate the received signal amplitude 
to some accuracy, say A$. This investigation led Hartley to conclude that there is a 
maximum data rate that can be communicated reliably over a band-limited channel 
when the maximum signal amplitude is limited to A max (fixed power constraint) and 
the amplitude resolution is A&. 

Another significant advance in the development of communications was the work 
of Kolmogorov (1939) and Wiener (1942), who considered the problem of estimating a 
desired signal waveform s(t ) in the presence of additive noise n(t), based on observation 
of the received signal r(t) = s(t) + n(t). This problem arises in signal demodulation. 
Kolmogorov and Wiener determined the linear filter whose output is the best mean- 
square approximation to the desired signal s(t). The resulting filter is called the optimum 
linear ( Kolmogorov-Wiener ) filter. 

Hartley’s and Nyquist’s results on the maximum transmission rate of digital in- 
formation were precursors to the work of Shannon ( 1948a, b), who established the 
mathematical foundations for information transmission and derived the fundamental 
limits for digital communication systems. In his pioneering work, Shannon formulated 
the basic problem of reliable transmission of information in statistical terms, using 
probabilistic models for information sources and communication channels. Based on 
such a statistical formulation, he adopted a logarithmic measure for the information 
content of a source. He also demonstrated that the effect of a transmitter power con- 
straint, a bandwidth constraint, and additive noise can be associated with the channel 
and incorporated into a single parameter, called the channel capacity. For example, 
in the case of an additive white (spectrally flat) Gaussian noise interference, an ideal 
band-limited channel of bandwidth W has a capacity C given by 



(1.4-2) 



bits/s 


(1.4-3) 
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where P is the average transmitted power and No is the power spectral density of the 
additive noise. The significance of the channel capacity is as follows: If the information 
rate R from the source is less than C( R < C ), then it is theoretically possible to achieve 
reliable (error-free) transmission through the channel by appropriate coding. On the 
other hand, if R > C, reliable transmission is not possible regardless of the amount of 
signal processing performed at the transmitter and receiver. Thus, Shannon established 
basic limits on communication of information and gave birth to a new held that is now 
called information theory. 

Another important contribution to the held of digital communication is the work 
of Kotelnikov (1947), who provided a coherent analysis of the various digital commu- 
nication systems based on a geometrical approach. Kotelnikov’s approach was later 
expanded by Wozencraft and Jacobs (1965). 

Following Shannon’s publications came the classic work of Hamming (1950) on 
error-detecting and error-correcting codes to combat the detrimental effects of channel 
noise. Hamming’s work stimulated many researchers in the years that followed, and a 
variety of new and powerful codes were discovered, many of which are used today in 
the implementation of modern communication systems. 

The increase in demand for data transmission during the last four decades, coupled 
with the development of more sophisticated integrated circuits, has led to the develop- 
ment of very efficient and more reliable digital communication systems. In the course 
of these developments, Shannon’s original results and the generalization of his results 
on maximum transmission limits over a channel and on bounds on the performance 
achieved have served as benchmarks for any given communication system design. The 
theoretical limits derived by Shannon and other researchers that contributed to the de- 
velopment of information theory serve as an ultimate goal in the continuing efforts to 
design and develop more efficient digital communication systems. 

There have been many new advances in the area of digital communications follow- 
ing the early work of Shannon, Kotelnikov, and Hamming. Some of the most notable 
advances are the following: 

• The development of new block codes by Muller (1954), Reed (1954), Reed and 
Solomon (1960), Bose and Ray-Chaudhuri (1960a, b), and Goppa (1970, 1971). 

• The development of concatenated codes by Forney (1966a). 

• The development of computationally efficient decoding of Bose-Chaudhuri- 
Hocquenghem (BCH) codes, e.g., the Berlekamp-Massey algorithm (see Chien, 
1964; Berlekamp, 1968). 

• The development of convolutional codes and decoding algorithms by Wozencraft 
and Reiffen (1961), Fano (1963), Zigangirov (1966), Jelinek (1969). Forney (1970b, 
1972, 1974), and Viterbi (1967, 1971). 

• The development of trellis-coded modulation by Ungerboeck (1982), Fomey et al. 
(1984), Wei (1987), and others. 

• The development of efficient source encodings algorithms for data compression, such 
as those devised by Ziv and Lempel (1977, 1978), and Linde et al. (1980). 

• The development of low-density parity check (LDPC) codes and the sum-product 
decoding algorithm by Gallager (1963). 

• The development of turbo codes and iterative decoding by Berrou et al. (1993). 
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■ 1.5 

OVERVIEW OF THE BOOK 

Chapter 2 presents a review of deterministic and random signal analysis. Our primary 
objectives in this chapter are to review basic notions in the theory of probability and 
random variables and to establish some necessary notation. 

Chapters 3 through 5 treat the geometric representation of various digital modula- 
tion signals, their demodulation, their error rate performance in additive, white Gaussian 
noise (AWGN) channels, and methods for synchronizing the receiver to the received 
signal waveforms. 

Chapters 6 to 8 treat the topics of source coding, channel coding and decoding, and 
basic information theoretic limits on channel capacity, source information rates, and 
channel coding rates. 

The design of efficient modulators and demodulators for linear filter channels with 
distortion is treated in Chapters 9 and 10. Channel equalization methods are described 
for mitigating the effects of channel distortion. 

Chapter 1 1 is focused on multichannel and multicarrier communication systems, 
their efficient implementation, and their performance in AWGN channels. 

Chapter 12 presents an introduction to direct sequence and frequency hopped spread 
spectrum signals and systems and an evaluation of their performance under worst-case 
interference conditions. 

The design of signals and coding techniques for digital communication through 
fading multipath channels is the focus of Chapters 13 and 14. This material is especially 
relevant to the design and development of wireless communication systems. 

Chapter 15 treats the use of multiple transmit and receive antennas for improv- 
ing the performance of wireless communication systems through signal diversity and 
increasing the data rate via spatial multiplexing. The capacity of multiple antenna 
systems is evaluated and space-time codes are described for use in multiple antenna 
communication systems. 

Chapter 16 of this book presents an introduction to multiuser communication 
systems and multiple access methods. We consider detection algorithms for uplink 
transmission in which multiple users transmit data to a common receiver (a base 
station) and evaluate their performance. We also present algorithms for suppressing 
multiple access interference in a broadcast communication system in which a transmit- 
ter employing multiple antennas transmits different data sequences simultaneously to 
different users. 


■ 1.6 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

There are several historical treatments regarding the development of radio and telecom- 
munications during the past century. These may be found in the books by McMahon 
(1984), Millman (1984), and Ryder and Fink (1984). We have already cited the classi- 
cal works of Nyquist (1924), Hartley (1928), Kotelnikov (1947), Shannon (1948), and 
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Hamming (1950), as well as some of the more important advances that have occurred 
in the field since 1950. The collected papers by Shannon have been published by IEEE 
Press in a book edited by Sloane and Wyner (1993) and previously in Russia in a 
book edited by Dobrushin and Lupanov (1963). Other collected works published by 
the IEEE Press that might be of interest to the reader are Key Papers in the Development 
of Coding Theory, edited by Berlekamp (1974), and Key Papers in the Development of 
Information Theory, edited by Slepian (1974). 



Deterministic and Random Signal Analysis 


In this chapter we present the background material needed in the study of the following 
chapters. The analysis of deterministic and random signals and the study of different 
methods for their representation are the main topics of this chapter. In addition, we 
also introduce and study the main properties of some random variables frequently 
encountered in analysis of communication systems. We continue with a review of 
random processes, properties of lowpass and bandpass random processes, and series 
expansion of random processes. 

Throughout this chapter, and the book, we assume that the reader is familiar with 
the properties of the Fourier transform as summarized in Table 2.0-1 and the important 
Fourier transform pairs given in Table 2.0-2. 

In these tables we have used the following signal definitions. 

^ < 2 f sinpr t) * / r\ 

t = ±\ sinc(f) = ^ 

2 11 t = 0 

otherwise 

and 

-1 < t < 0 
0 < t < 1 
otherwise 



t + 1 

a ( 0 = n(0 ★ n(o = { -t + l 
o 



The unit step signal is defined as 


U-i(t) 


1 t > 0 

< i t = 0 

,0 t < 0 


We also assume that the reader is familiar with elements of probability, random 
variables, and random processes as covered in standard texts such as Papoulis and Pillai 
(2002), Leon-Garcia (1994), and Stark and Woods (2002). 
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TABLE 2.0-1 

Table of Fourier Transform Properties 


Property 

Signal 

Fourier Transform 

Linearity 

ax\ (t) + px 2 (t) 

aXi(f) + pX 2 (f) 

Duality 

X{t) 

x(~f) 

Conjugacy 

x*(t) 

x*(-f) 

Time-scaling ( a ^ 0) 

x(at) 


Time-shift 

x{t - t 0 ) 


Modulation 

e j2ltfo ’x(f) 

X(f - /o) 

Convolution 

x(t) * y(t) 

X(f)Y(f) 

Multiplication 

x(t)y(t) 

X(f)*Y(f) 

Differentiation 

£x(t) 

(;2 7tffX{f) 

Differentiation in frequency 

t n x(t) 

(£)" 

Integration 

f x{z)dz 

§§ + l*(0)S(f) 

Parseval’s theorem 

pOO 

/ x(t)y*(t)dt = 
J — OO 

poo 

/ X(f)Y*(f)df 

J — OO 

Rayleigh's theorem 

poo 

/ \x(t)\ 2 dt = 

J — OO 

poo 

/ \X(f)\ 2 df 

J — OO 


■ 2.1 

BANDPASS AND LOWPASS SIGNAL REPRESENTATION 

As was discussed in Chap. 1 , the process of communication consists of transmission 
of the output of an information source over a communication channel. In almost all 
cases, the spectral characteristics of the information sequence do not directly match the 
spectral characteristics of the communication channel, and hence the information signal 
cannot be directly transmitted over the channel. In many cases the information signal 
is a low frequency (baseband) signal, and the available spectrum of the communication 
channel is at higher frequencies. Therefore, at the transmitter the information signal is 
translated to a higher frequency signal that matches the properties of the communication 
channel. This is the modulation process in which the baseband information signal is 
turned into a bandpass modulated signal. In this section we study the main properties 
of baseband and bandpass signals. 


2.1-1 Bandpass and Lowpass Signals 

In this section we will show that any real, narrowband, and high frequency signal — 
called a bandpass signal — can be represented in terms of a complex low frequency 
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TABLE 2.0-2 

Table of Fourier Transform Pairs 


Time Domain 

Frequency Domain 

m 

1 

i 

8(f) 

S(t - to) 

e -j2nfU) 

e jin f Q t 

8(f ~ fa) 

cos(2;r/of) 

\8(f - fa) + \8(f + fa) 

sin(2;r/o0 

jj8(f - fa) ~jj8(f + fa) 

rip) 

sine (/) 

sine/) 

n(/) 

Ah) 

sinc 2 (/) 

sinc 2 (f) 

At/) 

a > 0 

1 

a+jlitf 

te~°"u-i(t), a > 0 

1 

(a+j2ir/) 2 

e -o|,| (0! > 0) 

2a 

a 2 +(2jtf) 2 

e ~ nt 2 

g-7tf 2 

sgn(f) 

1 

j”f 

M-ih) 

7.8(f) + 


U-l(f) 

8 '(0 

jtorf 

S M (t) 

(jhrfr 

1 

t 

-jn sgn (/) 

oo 

W- nT o) 

n=—oo 

OO 

n=—o 0 


signal, called the lowpass equivalent of the original bandpass signal. This result makes 
it possible to work with the lowpass equivalents of bandpass signals instead of directly 
working with them, thus greatly simplifying the handling of bandpass signals. That is 
so because applying signal processing algorithms to lowpass signals is much easier due 
to lower required sampling rates which in turn result in lower rates of the sampled data. 

The Fourier transform of a signal provides information about the frequency content, 
or spectrum, of the signal. The Fourier transform of a real signal x(t) has Hermitian 
symmetry, i.e., X(—f) = X*(f), from which we conclude that \X(—f)\ = |X(/)| and 
lX*(f) = — LX(f ). In other words, for real x(t). the magnitude of X(f ) is even and 
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FIGURE 2.1-1 

The spectrum of a real-valued lowpass 
(baseband) signal. 


X(f) 



its phase is odd. Because of this symmetry, all information about the signal is in the 
positive (or negative) frequencies, and in particular x(t) can be perfectly reconstructed 
by specifying X(f) for / > 0. Based on this observation, for a real signal x(t), we 
define the bandwidth as the smallest range of positive frequencies such that X(f) = 0 
when | / 1 is outside this range. It is clear that the bandwidth of a real signal is one-half 
of its frequency support set. 

A lowpass, or baseband, signal is a signal whose spectrum is located around the 
zero frequency. For instance, speech, music, and video signals are all lowpass signals, 
although they have different spectral characteristics and bandwidths. Usually lowpass 
signals are low frequency signals, which means that in the time domain, they are slowly 
varying signals with no jumps or sudden variations. The bandwidth of a real lowpass 
signal is the minimum positive W such that X(f) = 0 outside [— W, + W], For these 
signals the frequency support, i.e., the range of frequencies for which X(f) f 0, is 
[— W, + W \ . An example of the spectrum of a real- valued lowpass signal is shown in 
Fig. 2.1-1. The solid line shows the magnitude spectrum |X(/)|, and the dashed line 
indicates the phase spectrum LX(f). 

We also define the positive spectrum and the negative spectrum of a signal x(t) as 



r *(/) 

/> 0 


(X(f) 

/ <0 

*+(/) = 

5*(°> 

/ = 0 

*-(/) = 

\X(0) 

/ = 0 


lo 

/< 0 


lo 

/ >0 


It is clear that X + (f) = X(f)u.ff), X_(/) = X(f)u^(-f) and X(f) = X + (f) + 
A_(/). For a real signal x(t), since X{f) is Hermitian, we have X_(/) = Xf{—f). 

For a complex signal x(t), the spectrum X( f ) is not symmetric; hence, the signal 
cannot be reconstructed from the information in the positive frequencies only. For 
complex signals, we define the bandwidth as one-half of the entire range of frequencies 
over which the spectrum is nonzero, i.e., one-half of the frequency support of the signal. 
This definition is for consistency with the definition of bandwidth for real signals. With 
this definition we can state that in general and for all signals, real or complex, the 
bandwidth is defined as one-half of the frequency support. 

In practice, the spectral characteristics of the message signal and the communication 
channel do not always match, and it is required that the message signal be modulated 
by one of the many different modulation methods to match its spectral characteristics to 
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FIGURE 2.1-2 

The spectrum of a real-valued bandpass signal. 

the spectral characteristics of the channel. In this process, the spectrum of the lowpass 
message signal is translated to higher frequencies. The resulting modulated signal is a 
bandpass signal. 

A bandpass signal is a real signal whose frequency content, or spectrum, is located 
around some frequency ± /o which is far from zero. More formally, we define a bandpass 
signal to be a real signal x(t) for which there exists positive /o and W such that the 
positive spectrum of X(f), i.e., X ( (/), is nonzero only in the interval [/o — W / 2, /o + 
W /2\. where W /2 < f } (in practice, usually W <<C /o). The frequency /o is called the 
central frequency. Obviously, the bandwidth of x(t) is at most equal to W. Bandpass 
signals are usually high frequency signals which are characterized by rapid variations 
in the time domain. 

An example of the spectrum of a bandpass signal is shown in Figure 2.1-2. Note 
that since the signal x(t ) is real, its magnitude spectrum (solid line) is even, and its phase 
spectrum (dashed line) is odd. Also, note that the central frequency /o is not necessarily 
the midband frequency of the bandpass signal. Due to the symmetry of the spectrum, 
X + {f) has all the information that is necessary to reconstruct X(f). In fact we can write 

X(f ) = x + (f) + X_(/) = X+(f) + X* + (~f) (2.1-2) 

which means that knowledge of X + {f) is sufficient to reconstruct X( f ). 


2.1-2 Lowpass Equivalent of Bandpass Signals 

We start by defining the analytic signal, or the pre-envelope, corresponding to x{t) as 
the signal x + (t) whose Fourier transform is X + (f). This signal contains only positive 
frequency components, and its spectrum is not Hermitian. Therefore, in general, x + {t) 
is a complex signal. We have 


x + (t) = dF- 1 [X + (/)] 

= [X(f)u^(f)] 
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FIGURE 2.1-3 

The spectrum of the lowpass equivalent of the 
signal shown in Figure 2.1-2. 


/ 


where x(t) = X *x(t) is the Hilbert transform of xft). The Hilbert transform of xft) is 
obtained by introducing a phase shift of — | at positive frequency components of xft) 
and | at negative frequencies. In the frequency domain we have 

^[x(t)} = ~jsgn(f)X(f) (2.1-4) 

Some of the properties of the Hilbert transform will be covered in the problems at the 
end of this chapter. 

Now we define xft), the lowpass equivalent, or the complex envelope, of x(t), as 
the signal whose spectrum is given by 2 X + ( f + fo), i.e., 

Xff) = 2 X + (f + f 0 ) = 2 X(f + / 0 )M-t(/ + fo) (2.1-5) 

Obviously the spectrum of xft) is located around the zero frequency, and therefore it is 
in general a complex lowpass signal. This signal is called the lowpass equivalent or the 
complex envelope of x{t). The spectrum of the lowpass equivalent of the signal shown 
in Figure 2.1-2 is shown in Figure 2.1-3. 

Applying the modulation theorem of the Fourier transform, we obtain 

xft) = dF- 1 [*/(/)] 

= 2 x + {t)e~ i27lfot 

= (x(t) + jx(t))e- j27Tfot (2.1-6) 

= (x(t) cos 2nfot + x(t ) sin 2nfot) 

+ j (x(t) cos lit f)t — x(t)sin27tfot) (2.1-7) 

From Equation 2.1-6 we can write 

x(t) = Re [xi(t)e j2nfot ] (2.1-8) 

This relation expresses any bandpass signals in terms of its lowpass equivalent. Using 
Equations 2.1-2 and 2.1-5, we can write 

X(f) = \ [Xff - f 0 ) + X*{-f - / 0 )] (2.1-9) 

Equations 2.1-8, 2.1-9, 2.1-5, and 2.1-7 express x{t) and xf t) in terms of each other 
in the time and frequency domains. 

The real and imaginary parts of xft) are called the in-phase component and the 
quadrature component of x(t), respectively, and are denoted by xf t) and x q (t). Both 
xft) and x q (t) are real- valued lowpass signals, and we have 

Xft) = Xft) + jx q (t) 


( 2 . 1 - 10 ) 
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Comparing Equations 2.1-10 and 2.1-7, we conclude that 
Xi(t) = x(t)cos2itfot +x(t) sinlnfot 
x q (t) = x(t) cos iTi fot — x(t) sin27r/ 0 t 
Solving Equation 2.1-1 1 for x (f) and x(t) gives 

x(t ) = jc,(f)cos27r/of — x q (t) sin27r/o? 
x(t ) = x q (t)cos2nfot + x ; -(f)sin27r/o/ 

Equation 2.1-12 shows that any bandpass signal x(t) can be expressed in terms of 
two lowpass signals, namely, its in-phase and quadrature components. 

Equation 2.1-10 expresses xi(t) in terms of its real and complex parts. We can 
write a similar relation in polar coordinates expressing x(t) in terms of its magnitude 
and phase. If we define the envelope and phase of x(t), denoted by r x (t) and 0 x (t), 
respectively, by 


( 2 . 1 - 11 ) 


( 2 . 1 - 12 ) 


g(0 = \Jx}it) + X‘(0 

(2.1-13) 

6 x (t) = arctan — - 
Xi(t) 

(2.1-14) 

we have 


II 

£ 

Hi 

(2.1-15) 

Substituting this result into Equation 2.1-8 gives 


x(t) = Re [r x (t)e K2nfot+eAl)) ] 

(2.1-16) 

resulting in 


x{t) = r x (t) cos (2nf 0 t + G x (t)) 

(2.1-17) 


A bandpass signal and its envelope are shown in Figure 2.1-4. 



FIGURE 2.1-4 

A bandpass signal. The dashed curve denotes the envelope. 
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It is important to note that xi(t ) — and consequently x;(t), x q (t), r x {t), and 0 x (t ) — 
depends on the choice of the central frequency /o. For a given bandpass signal x(t), 
different values of f 0 — as long as X ( ( /') is nonzero only in the interval [/ 0 —W/2, / 0 + 
W/2]. where W/2 < f 0 — yield different lowpass signals x/(t). Therefore, it makes 
more sense to define the lowpass equivalent of a bandpass signal with respect to a 
specific /o. Since in most cases the choice of /o is clear, we usually do not make this 
distinction. 

Equations 2.1-12 and 2.1-17 provide two methods for representing a bandpass 
signal x(t) in terms of two lowpass signals, one in terms of the in-phase and quadrature 
components and one in terms of the envelope and the phase. The two relations given in 
Equations 2.1-8 and 2.1-12 that express the bandpass signal in terms of the lowpass 
component(s) define the modulation process, i.e., the process of going from lowpass to 
bandpass. The system that implements this process is called a modulator. The structure 
of a general modulator implementing Equations 2.1-8 and 2.1-12 is shown in Fig- 
ure 2. 1— 5(a) and (b). In this figure double lines and double blocks indicate complex 
values and operations. 

Similarly, Equations 2.1-7 and 2.1-1 1 represent how x/(f), or x,(7) and x q (t), can 
be obtained from the bandpass signal x{t). This process, i.e., extracting the lowpass 
signal from the bandpass signal, is called the demodulation process and is shown in 
Figure 2. 1— 6(a) and (b). In these block diagrams the block denoted by e 7 ? represents 
a Hilbert transform, i.e., an LTI system with impulse response h{t) = J ; and transfer 
function H(f) = -jsgn(f). 



(a) 


cos 



(b) 


1 


x(t) 


FIGURE 2.1-5 

A complex (a) and real (b) modulator. A general representation for a modulator is 
shown in (c). 
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(a) 


cos 27 rf 0 t 



X(t) 


fo 


Demodulator 


• x,(t) 


(c) 


FIGURE 2.1-6 

A complex (a) and real (b) demodulator. A general representation for a demodulator is 
shown in (c). 


2.1-3 Energy Considerations 

In this section we study the relation between energy contents of the signals introduced 
in the preceding pages. The energy of a signal x(t) is defined as 

/ OO 

\x(t)\ 2 dt (2.1-18) 

-OO 

and by Rayleigh’s relation from Table 2.0-1 we can write 

/ oo poo 

\x(t)\ 2 dt = \X(f)\ 2 dt (2.1-19) 

-oo J —oo 

Since there is no overlap between X + (f) and X_(/), we have X + (f)X^(f) = 0, 
and hence 

/ OO 

\X + (f) + X_(f)\ 2 df 

-oo 

/ oo poo 

\X+(f)\ 2 df + / \x_(f)\ 2 df 

-oo J —oo 

/ oo 

\x+(f)\ 2 df 

-oo 

= 2 £ x+ 


( 2 . 1 - 20 ) 
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On the other hand, 


S x = 2 



= 2 



\X+(f)\ 2 df 

X , (/) 2 ,, 
— ^ — df 
2 


1 

2 




( 2 . 1 - 21 ) 


This shows that the energy in the lowpass equivalent signal is twice the energy in the 
bandpass signal. 

We define the inner product of two signals x(t) and y(t') as 

/ oo roo 

x{t)y*(t)dt = / X(f)Y*(f)df (2.1-22) 

-oo J — OO 

where we have used Parseval’s relation from Table 2.0-1. Obviously 

S x = (x(t),x(t)) (2.1-23) 


In Problem 2.2 we prove that if x(t) and y(t) are two bandpass signals with lowpass 
equivalents xft) and yft) with respect to the same /o, then 

(x(t), y(t)) = * Re [(xft), y/(0)l (2.1-24) 

The complex quantity p xy , called the cross-correlation coefficient of x(t) and y(t), is 
defined as 


Px,y — 


(x(t), y(t)) 



(2.1-25) 


and represents the normalized inner product between two signals. From £ Xl = 2£ x and 
Equation 2.1-24 we can conclude that if x(t) and y(t) are bandpass signals with the 
same fo, then 


p x , y = Re(p Xuyi ) (2.1-26) 

Two signals are orthogonal if their inner product (and subsequently, their p) is 
zero. Note that if p x ,,y, = 0, then using Equation 2.1-26, we have p x y = 0; but the 
converse is not necessarily true. In other words, orthogonality in the baseband implies 
orthogonality in the pass band, but not vice versa. 

example 2 . 1 - 1 . Assume that m(t) is a real baseband signal with bandwidth W, and 

define two signals x(t) = m(t) cos27r/o? and y(t) = m(t) sin 2njyt, where fo > W. 

Comparing these relations with Equation 2.1-12, we conclude that 

Xi(t) = m(t) x q {t) = 0 

yft) = 0 y q (t) = —m(t) 
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or, equivalently. 


Note that here 


Therefore, 


xiit ) = m(f) 
yi(t) = ~jm(t) 


/ oo 

m (t) — j 

-oo 


Px . y — Rg (Px/ ,yi ) — Re (j£ m ) — 0 

This means that xit) and y(t) are orthogonal, but their lowpass equivalents are not 
orthogonal. 


2.1-4 Lowpass Equivalent of a Bandpass System 

A bandpass system is a system whose transfer function is located around a frequency 
/o (and its mirror image — /o). More formally, we define a bandpass system as a system 
whose impulse response h(t) is a bandpass signal. Since h(t) is bandpass, it has a 
lowpass equivalent denoted by hi(t ) where 

h(t) = Re [hi(t)e J2nfot ] (2.1-27) 

If a bandpass signal x(t) passes through a bandpass system with impulse response 
h(t), then obviously the output will be a bandpass signal y(t). The relation between the 
spectra of the input and the output is given by 

Y(f) = ) (2.1-28) 

Using Equation 2.1-5, we have 

Yi(f) = 2Y(f + /o)«_i(/ + /o) 

= 2X(f + fo)H(f + f 0 )u^(f + f 0 ) 

= 2 [2 X(f + /o)m_i(/ + M [2 H(f + fo)u-i(f + /o)] 

= (2.1-29) 

where we have used the fact that for / > — /o, which is the range of frequencies of 
interest, + /o) = u \( f + /o) = 1. In the time domain we have 

yM= X -x l (t) + h l {t) (2.1-30) 

Equations 2.1-29 and 2.1-30 show that when a bandpass signal passes through a 
bandpass system, the input-output relation between the lowpass equivalents is very 
similar to the relation between the bandpass signals, the only difference being that for 
the lowpass equivalents a factor of \ is introduced. 
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Signal space (or vector) representation of signals is a very effective and useful tool in 
the analysis of digitally modulated signals. We cover this important approach in this 
section and show that any set of signals is equivalent to a set of vectors. We show that 
signals have the same basic properties of vectors. We study methods of determining an 
equivalent set of vectors for a set of signals and introduce the notion of signal space 
representation, or signal constellation, of a set of waveforms. 


2.2-1 Vector Space Concepts 


A vector v in an //-dimensional space is characterized by its n components v \ v 2 ■ ■ ■ v„. 
Let v denote a column vector, i.e., v = \v\ v 2 - ■ ■ u„]', where A r denotes the transpose 
of matrix A. The inner product of two //-dimensional vectors iq = [ v \ \ V\2 ■ ■ ■ iq„ |' 
and 1/2 = [1/21 U22 • • • V2 n Y is defined as 

n 

( Vi, V 2 ) = iq • v 2 = ^2 u 1 , 1 / 2 , = V? Vi ( 2 . 2 - 1 ) 

,=i 

where A " denotes the Hermitian transpose of the matrix A, i.e., the result of first 
transposing the matrix and then conjugating its elements. From the definition of the 
inner product of two vectors it follows that 

(» 1 ,»2) = (»2,wi)* (2.2-2) 


and therefore, 


(iq, i/ 2 ) + (i/ 2 , iq) = 2Re[(tq, v 2 )] (2.2-3) 

A vector may also be represented as a linear combination of orthogonal unit vectors 
or an orthonormal basis e, , I < i < n, i.e., 

n 

v = J2 v < e i (2.2-4) 

;= t 

where, by definition, a unit vector has length unity and v, is the projection of the vector 
v onto the unit vector e M i.e., //, = {v. e, ). Two vectors iq and v 2 are orthogonal if 
(iq, 1/2 ) = 0 . More generally, a set of m vectors iq, 1 < k < m, are orthogonal if 
(Vj, vj) = 0 for all 1 < /, j < m, and i ^ j . The norm of a vector i/ is denoted by ||v|| 
and is defined as 


[|w[| = ((v, v )) 1/2 = 



( 2 . 2 - 5 ) 


which in the //-dimensional space is simply the length of the vector. A set of /?/ vec- 
tors is said to be orthonormal if the vectors are orthogonal and each vector has a 
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unit norm. A set of m vectors is said to be linearly independent if no one vector can be 
represented as a linear combination of the remaining vectors. Any two n-dirnensional 
vectors iq and v 2 satisfy the triangle inequality 

I|wi + »2ll<ll«>ill + l|w2ll (2-2-6) 

with equality if iq and v 2 are in the same direction, i.e., iq = av 2 where a is a positive 
real scalar. The Cauchy— Schwarz inequality states that 

l(wi,»2>l < llwtll • ll»2ll (2-2-7) 

with equality if iq = av 2 for some complex scalar a. The norm square of the sum of 
two vectors may be expressed as 

\\VI + V 2 \\ 2 = lltr II 2 + ||tq|| 2 + 2Re[(iq, w 2 >] (2.2-8) 

If iq and v 2 are orthogonal, then (iq, v 2 ) = 0 and, hence, 

Il»i + W2ll 2 =ll®ill 2 +I|w2ll 2 (2-2-9) 

This is the Pythagorean relation for two orthogonal n -dimensional vectors. From matrix 
algebra, we recall that a linear transformation in an ^-dimensional vector space is a 
matrix transformation of the form v' = Av, where the matrix A transforms the vector 
v into some vector v' . In the special case where v' = X v . i.e., 

Av = Xv 


where X is some scalar, the vector v is called an eigenvector of the transformation and 
X is the corresponding eigenvalue. 

Finally, let us review the Gram-Schmidt procedure for constructing a set of or- 
thonormal vectors from a set of n -dimensional vectors v,, 1 < i < m. We begin by 
arbitrarily selecting a vector from the set, say, v 1 . By normalizing its length, we obtain 
the first vector, say, 

u 1 = (2.2-10) 

II®! II 

Next, we may select u 2 and, first, subtract the projection of r 2 onto u\. Thus, we obtain 

« 2 = ®2 - ((» 2 . «!»«! ( 2 . 2 - 11 ) 


Then we normalize the vector u' 2 to unit length. This yields 


u 2 = 



( 2 . 2 - 12 ) 


The procedure continues by selecting iq and subtracting the projections of iq into U\ 
and m 2 . Thus, we have 


«3 = Vs - ((l>3, «l))«i - «l>3, u 2 ))u 2 (2.2-13) 

Then the orthonormal vector u 2 is 



M3 = 


(2.2-14) 
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By continuing this procedure, we construct a set of N orthonormal vectors, where 
N < min (m, n). 


2.2-2 Signal Space Concepts 


As in the case of vectors, we may develop a parallel treatment for a set of signals. The 
inner product of two generally complex-valued signals X\(t) and X 2 (f) is denoted by 
(xi(0, x 2 (t)) and defined as 

/ OO 

xft)x*(t)dt (2.2-15) 

-OO 


similar to Equation 2.1-22. The signals are orthogonal if their inner product is zero. 
The norm of a signal is defined as 

/ roo \ 1/2 

Mt)\\ = U \x(t)\ 2 dtj = ^£ x (2.2-16) 


where £ x is the energy in x(t). A set of m signals is orthonormal if they are orthogonal 
and their norms are all unity. A set of m signals is linearly independent if no signal can 
be represented as a linear combination of the remaining signals. The triangle inequality 
for two signals is simply 


ll*t(f) + *2(011 < ll*i(0ll + II *2(011 (2.2-17) 

and the Cauchy-Schwarz inequality is 

l(* 1 (0.* 2 (0)l < ll*t(0ll • 11*2(011 = \JZxiSx2 (2.2-18) 

or, equivalently. 


roo 



1/2 

r°° 

/ X\{t)X2(t)dt 
J — OO 

< 

/ \xi(t)\ 2 dt 
J —OO 


/ \x 2 (t)\ dt 

J —OO 


with equality when * 2(0 = ax ft), where a is any complex number. 


2.2-3 Orthogonal Expansions of Signals 


In this section, we develop a vector representation for signal waveforms, and thus we 
demonstrate an equivalence between a signal waveform and its vector representation. 
Suppose that s(t ) is a deterministic signal with finite energy 


£ s = 



|s(0l 2 dt 


( 2 . 2 - 20 ) 


Furthermore, suppose that there exists a set of functions {0„(O, n = 1,2 , . . . , K} that 
are orthonormal in the sense that 


Mt)<P*(t)dt = 


(4>n(t), f m (t)) = 


— OO 


1 m = n 

0 m f n 


( 2 . 2 - 21 ) 
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We may approximate the signal s(t ) by a weighted linear combination of these func- 
tions, i.e., 

K 

s(t) = J>fc(0 (2.2-22) 

4=1 

where {.v/., I < k < A'} are the coefficients in the approximation of s(t). The approx- 
imation error incurred is 


e(t) = 

Let us select the coefficients {.?*.} so as to minimize the energy £ e of the approximation 
error. Thus, 


r oo 

/ — oo 

/ oo 

-oo 


£ e = |s(t) — ?(f)| 2 dt 


KO - Y SkC ^ k ^ 

k=\ 


dt 


(2.2-23) 

(2.2-24) 


The optimum coefficients in the series expansion of s(t ) may be found by differentiating 
Equation 2.2-23 with respect to each of the coefficients {.sy } and setting the first deriva- 
tives to zero. Alternatively, we may use a well-known result from estimation theory 
based on the mean square error criterion, which, simply stated, is that the minimum 
of £ e with respect to the (,sy } is obtained when the error is orthogonal to each of the 
functions in the series expansion. Thus, 



K 

s(t) - Sk<Pk(t) 
k=\ 


<p;,(t)dt = o, 


n = 1,2, ..., K 


Since the functions {</>„(0} are orthonormal, Equation 2.2-25 reduces to 


s„ = ( s(t ), </>„(f)) = 



dt , 


n= 1,2, ..., K 


(2.2-25) 


(2.2-26) 


Thus, the coefficients are obtained by projecting the signal s{t) onto each of the 
functions {</>„(?)}. Consequently, s(t) is the projection of s(t) onto the K -dimensional 
signal space spanned by the functions {4>„(t)}, and therefore it is orthogonal to the error 
signal e(f) = s(f) — ?(f), i.e., (e(f), ?(0) = 0. The minimum mean-square approxima- 
tion error is 


£min= / e(t)s*(t)dt 

J — OO 

/ oo 

|j(0| 2 dt - 

-oo 

K 

= £s - Y i^i 2 


/ oo K 

Y*kMt)s\t)dt 

■°° k=i 


k=\ 


(2.2-27) 

(2.2-28) 


(2.2-29) 
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which is nonnegative, by definition. When the minimum mean square approximation 
error £ min = 0, 

K poo 

£ = £>* l 2 = / \s(t)\ 2 dt (2.2-30) 

k = i 

Under the condition that <£' nlln = 0, we may express s(t) as 

K 

= (2.2-31) 

*=i 

where it is understood that equality of s(t ) to its series expansion holds in the sense that 
the approximation error has zero energy. 

When every finite energy signal can be represented by a series expansion of the 
form in Equation 2.2-31 for which £ mm = 0, the set of orthonormal functions {<f> n (t)} 
is said to be complete. 


example 2.2-1. trigonometric Fourier SERIES: Consider a finite energy real sig- 
nal s(t) that is zero everywhere except in the range 0 < t <T and has a finite number 
of discontinuities in this interval. Its periodic extension can be represented in a Fourier 
series as 

^ / 2 nkt 2i xkt\ 

sit) = > [a k cos — — + b k sin — — ) (2.2-32) 

k = 0 ' ' 


where the coefficients \a k . b k \ that minimize the mean square error are given by 

1 f T 

flo = — / s(t)dt 

1 Jo 

2 f T 2nkt 

a k = — s(t) cos dt, k= 1,2,3,... (2.2-33) 

T Jo T 

2 f T 2 nkt 

bk=— s(t) sin ~—dt, k= 1,2,3,... 

1 Jo 1 

The set of functions {1 /*JT, \j2fT cos 2nkt/T, ~J2/T sin 2rckt/T} is a complete 
set for the expansion of periodic signals on the interval [0, T], and, hence, the series 
expansion results in zero mean square error. 

example 2 . 2 - 2 . exponential Fourier series : Consider a general finite energy sig- 
nal ,v(r) (real or complex) that is zero everywhere except in the range 0 < t < T and 
has a finite number of discontinuities in this interval. Its periodic extension can be 
represented in an exponential Fourier series as 

OO 

s(t)= x„e j27r T' (2.2-34) 

n=—oo 

where the coefficients {x n } that minimize the mean square error are given by 

i f 00 

*n=~ x(t)e- j27C r‘dt (2.2-35) 

' J — OO 
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The set of functions {y/\ / T e^T') is a complete set for expansion of periodic signals 
on the interval [0, T], and, hence, the series expansion results in zero mean square 
error. 


2.2-4 Gram-Schmidt Procedure 


Now suppose that we have a set of finite energy signal waveforms {s m (t), m = 1,2, ... , 
M } and we wish to construct a set of orthonormal waveforms. The Gram-Schmidt 
ortho gonalization procedure allows us to construct such a set. This procedure is similar 
to the one described in Section 2.2-1 for vectors. We begin with the first waveform 
,V| (7), which is assumed to have energy S\. The first orthonormal waveform is simply 
constructed as 


01 (0 = 


Si(t) 


(2.2-36) 


Thus, 0i {t) is simply S\(t) normalized to unit energy. The second waveform is con- 
structed from 52(0 by first computing the projection of 52(0 onto 0i it), which is 


c 21 = ($2(0, 0i(O> = 



52(001 (0 dt 


Then C2i0i(O is subtracted from 52(0 to yield 


(2.2-37) 


nit) = 5 2 (o - c 2 i0i(O 


(2.2-38) 


This waveform is orthogonal to 0i(O> but it does not have unit energy. If £2 denotes 
the energy of yiit), i.e., 


£2 



y\it)dt 


the normalized waveform that is orthogonal to 0i(O is 

nit) 


02(0 = 


V£~2 


In general, the orthogonalization of the /All function leads to 

Ykit) 


0/1 (0 = 


\f£k 


where 


(2.2-39) 


(2.2-40) 


k - 1 

Ykit) = s k it) - 5>0»(O 
i = 1 


/ oo 

5/t(O0*(O dt, 

-00 

POO 

£k = / y 2 k it)dt 


i2.2-4l) 

i = 1,2, 1 (2.2^12) 


— OO 


(2.2^13) 
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Thus, the orthogonalization process is continued until all the M signal waveforms 
{s m (0} have been exhausted and N < M orthonormal waveforms have been con- 
structed. The dimensionality N of the signal space will be equal to M if all the signal 
waveforms are linearly independent, i.e., none of the signal waveforms is a linear 
combination of the other signal waveforms. 


example 2 . 2 - 3 . Let us apply the Gram-Schmidt procedure to the set of four wave- 
forms illustrated in Figure 2.2-1. The waveform s\(t) has energy £\ = 2, so that 


01 (0 = 



Sl(t) 


Next we observe that C 21 = 0; hence, 32 (f) and 0i(f) are orthogonal. Therefore, 02(0 = 
S 2 (t)/ yfSi = 32 (f). To obtain 03(0, we compute C 3 j and C 32 , which are C 31 = ~J2 

and C 23 = 0. Thus, 


X?( 0 = ^ 3(0 - V20i(O 


1 

0 


2 < t < 3 
otherwise 


Since 5 / 3(0 has unit energy, it follows that 03(0 = K3(0- Determining 04(0, we find 
that c 4 | = — \/2, C 42 = 0, and C 43 = 1. Hence, 

y 4 (0 = s 4 (f) + V20i(O - 03(0 = 0 


Consequently, 34(0 is a linear combination of 0i(O and 03(0 and, hence, 04 (f) = 0. 
The three orthonormal functions are illustrated in Figure 2.2-1 (b). 

Once we have constructed the set of orthonormal waveforms {0„(f )} , we can express 
the M signals {,s„,(f)} as linear combinations of the (0„(f)}. Thus, we may write 

N 

s m (t ) = ^2 Smn^nit), m = 1, 2, . . . , M (2.2-44) 

n = 1 

Based on the expression in Equation 2.2-44, each signal may be represented by the 
vector 


Sm — [ 3/1/ 1 2 * ‘ ‘ 3'm .V ] (2.2 45) 

or, equivalently, as a point in the N -dimensional (in general, complex) signal space with 
coordinates {s mn , n = 1,2,..., N). Therefore, a set of M signals {s m (t)\™ =] can be 
represented by a set of M vectors in the A^-dimensional space, where N < M. 

The corresponding set of vectors is called the signal space representation, or con- 
stellation, of Isjt )}^ , . If the original signals are real, then the corresponding vector 
representations are in M iV ; and if the signals are complex, then the vector representations 
are in C N . Figure 2.2-2 demonstrates the process of obtaining the vector equivalent 
from a signal (signal-to-vector mapping) and vice versa (vector- to- signal mapping). 
From the orthonormality of the basis {0„(f)} it follows that 
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■n(0 s 3 (t) 




<Ai(0 


03 (f) 


1 

V2 


2 


t 



/ 




1 



V2 



1 

1 

2 


V 2 




(b) 


FIGURE 2.2-1 

Gram-Schmidt orthogonalization of the signal {s m (r), m = E 2, 3, 4} and the corresponding 
orthonormal basis. 

The energy in the /nth signal is simply the square of the length of the vector or, equiv- 
alently, the square of the Euclidean distance from the origin to the point s m in the 
N -dimensional space. Thus, any signal can be represented geometrically as a point in 
the signal space spanned by the orthonormal functions {</>„ (f )} . From the orthonormality 
of the basis it also follows that 


(s k (t),s,(t)) = (s k ,si) (2.2-41) 

This shows that the inner product of two signals is equal to the inner product of the 
corresponding vectors. 
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•hit) 






(b) 


FIGURE 2.2-2 

Vector to signal (a), and signal to vector (b) mappings. 


example 2.2-4. Let us obtain the vector representation of the four signals shown in 
Figure 2.2-l(a) by using the orthonormal set of functions in Figure 2.2— 1 (b). Since 
the dimensionality of the signal space is N = 3, each signal is described by three 
components. The signal ,V| (t) is characterized by the vector .v i = (\/2, 0, 0)'. Similarly, 
the signals S 2 (t), 53 (f), and 54 (f) are characterized by the vectors s 2 = (0, y/2, 0)', 
53 = (V2, 0, 1)', and 54 = (— V2, 0, l) f , respectively. These vectors are shown in 
Figure 2.2-3. Their lengths are ||si|| = \/2, ||«2 II = \/2, ||S 3 1| = V3, and ||s 4 || = V3, 
and the corresponding signal energies are £/< = || || 2 , k = 1,2, 3, 4. 

We have demonstrated that a set of M finite energy waveforms {x,„(f )} can be rep- 
resented by a weighted linear combination of orthonormal functions {0„(f )} of dimen- 
sionality N < M. The functions {0„ (f)} are obtained by applying the Gram-Schmidt 
orthogonalization procedure on { 5 m (f)j. It should be emphasized, however, that the 
functions (0„(f) } obtained from the Gram-Schmidt procedure are not unique. If we 


Chapter Two: Deterministic and Random Signal Analysis 


37 



FIGURE 2.2-3 

The four signal vectors represented as points in 
three-dimensional space. 


alter the order in which the orthogonalization of the signals {.v„,(7)| is performed, the 
orthonormal waveforms will be different and the corresponding vector representation 
of the signals {s m (t)\ will depend on the choice of the orthonormal functions { 0 „(O}- 
Nevertheless, the dimensionality of the signal space N will not change, and the vectors 
{s,„ } will retain their geometric configuration; i.e., their lengths and their inner products 
will be invariant to the choice of the orthonormal functions {</>„(/)}• 

example 2 . 2 - 5 . An alternative set of orthonormal functions for the four signals in 
Figure 2.2-l(a) is illustrated in Figure 2.2-4(a). By using these functions to expand 
{s„(f)}, we obtain the corresponding vectors S[ = ( 1 , 1 , 0 )', s 2 = ( 1 ,- 1 , 0 )', S 3 = 
(1, 1, —1)', and S 4 = (-1,-1, —1)', which are shown in Figure 2.2-4(b). Note that 
the vector lengths are identical to those obtained from the orthonormal functions {(p„(t)}. 


WO WO WO 







► t 0 



1 c 




(a) 

<1*2 



FIGURE 2.2-4 

An alternative set of orthonormal functions for the four signals in Figure 2.2-l(a) and the 
corresponding signal points. 
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Bandpass and Lowpass Orthonormal Basis 

Let us consider the case in which the signal waveforms are bandpass and represented as 

s m (t) = Re [s m i(t)e i2nfot ] , m = 1 , 2, . . . , M (2.2-48) 

where {s m i(t)} denotes the lowpass equivalent signals. Recall from Section 2.1-1 that if 
two lowpass equivalent signals are orthogonal, the corresponding bandpass signals are 
orthogonal too. Therefore, if (0„/(t), n = I , .... /V } constitutes an orthonormal basis 
for the set of lowpass signals {.? m /(f)}, then the set {(p n (t), n = 1, . . . , N} where 

= ^2 Re [0„,(O7 27r/o? ] (2.2-49) 

is a set of orthonormal signals, where 72 is a normalization factor to make sure each 
(p n (t) has unit energy. However, this set is not necessarily an orthonormal basis for 
expansion of { ,v,„ ( r ) , m = 1 , . . . , M}. In other words, there is no guarantee that this set 
is a complete basis for expansion of the set of signals {s m (t), m = 1, . . . , M). Here our 
goal is to see how an orthonormal basis for representation of bandpass signals can be 
obtained from an orthonormal basis used for representation of the lowpass equivalents 
of the bandpass signals. 

Since we have 

N 

Sml(t ) = ^2 S mln4>nl(t ), TYl = 1 , . . . , M (2.2-50) 

77=1 

where 


Smln — (*^m/(0) 0n/( 0)> ttl — 1, . . . , M , 

from Equations 2.2-48 and 2.2-50 we can write 


(t) — Re 


s mln<Pnl{t) 


Jlnfot 


_ \n— 1 


n = l,...,N 


m = 1 , . . . , M 


(2.2-51) 


(2.2-52) 


or 


Sm (,t) = Re 


N 


'y ' Smlntpnl (t ) 


•i=l 


cos 2nfot — Im 




n= 1 


&in2jrfot (2.2-53) 


In Problem 2.6 we will see that when an orthonormal set of signals {c p„i(t ), n = 
1, . . . , TV} constitutes an N -dimensional complex basis for representation of {,s m /(r), 
m = 1, . . . , M }, then the set {< p n {t ), </>„(0> n = 1, . . . , A^}, where 


0„(O = 72 Re [<pni(t)e j27Tfot ] = 72</>„ ! (Ocos27r/ 0 r - 72^(/) sin2n f 0 t 

4>„(t ) = -72 Im [<p n i {t)e il7th ’} = — \f2(J) n i (f ) sin 2itfot - V2(p nq (t)cos2jtfot 

(2.2-54) 

constitutes a 2 IV -dimensional orthonormal basis that is sufficient for representation of 
M bandpass signals 


s m (t) = Re [s m i{t)e j2n fot ] , 


m = 1 , . . . , M 


(2.2-55) 
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In some cases not all basis functions in the set of basis given by Equation 2.2-54 are 
necessary, and only a subset of them would be sufficient to expand the bandpass signals. 
In Problem 2.7 we will further show that 


0(0 = - 0(0 


where 0(0 denotes the Hilbert transform of 0(0- 
From Equation 2.2-52 we have 


s m (t) = Re 
N 


/ N \ 

£.W„0 n ,(O ) e jl7rfot 


_ \n=\ 


= X] Re [CWn0nKO) e j2nfot ] 


n = 1 
N 


= £ 


n = 1 


p (r) 

nln 


„(*') 


^0„(O+^0„(O 


V2 


V2 


(2.2-56) 


(2.2-57) 


where we have assumed that s m i„ = s ( ^] n + js^ n . Equations 2.2-54 and 2.2-57 show 
how a bandpass signal can be expanded in terms of the basis used for expansion of its 
lowpass equivalent. In general, lowpass signals can be represented by an /V-dimcnsional 
complex vector, and the corresponding bandpass signal can be represented by 2N- 
dimensional real vectors. If the complex vector 

S ml — (^ml I ^ $ml2 1 - • ■ ? SmlN') 


is a vector representation for the lowpass signal s m [(t ) using the lowpass basis {0„/(O, 
n = 1 , . . . , TV}, then the vector 


S m — 




„(r) 


m/2 

^’71”" 


4 k 

a/2 ’ 


(!) (!) (!) ' 
S?nl\ S m i2 S m , N 

y/2’ V2’"" 72 


(2.2-58) 


is a vector representation of the bandpass signal 

s m (t) = Re [s m i(t)e j2nfot ] 


when the bandpass basis {0„(f). 0„(t), n = I . . . . , A) given by Equations 2.2-54 and 
2.2-57 is used. 


example 2.2-6. Let us assume M bandpass signals are defined by 
s m (t) = Re [A m g(t)e j2nfot ] 


(2.2-59) 


where A ,„ ’s are arbitrary complex numbers and g(t ) is a real lowpass signal with energy 
£ g . The lowpass equivalent signals are given by 

^m/(t) — A m g(t) 

and therefore the unit-energy signal <p(t) defined by 

g( 0 


0(0 = 


y/£g 


is sufficient to expand all ,s’ m ;(/)’s. 
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We have 


^m/(0 — A m \/-&g 0(0 


thus, corresponding to each s m j(t) we have a single complex scalar A m ^j£g = 
(AjO + jA^) \J£g\ i.e., the lowpass signals constitute one complex dimension (or, 
equivalently, two real dimensions). From Equation 2.2-54 we conclude that 


which agrees with the straightforward expansion of Equation 2.2-59. Note that in the 
special case where all A m ’s are real, 0(0 is sufficient to represent the bandpass signals 
and 0(0 is not necessary. 


In subsequent chapters, we shall encounter several different types of random variables. 
In this section we list these frequently encountered random variables, their probability 
density functions (PDFs), their cumulative distribution functions (CDFs), and their 
moments. Our main emphasis will be on the Gaussian random variable and many 
random variables that are derived from the Gaussian random variable. 

The Bernoulli Random Variable 

The Bernoulli random variable is a discrete binary- valued random variable taking values 
1 and 0 with probabilities p and 1 — p, respectively. Therefore the probability mass 
function (PMF) for tins random variable is given by 




can be used as a basis for expansion of the bandpass signals. 
Using this basis and Equation 2.2-57, we have 



= A ( ,;V0 c os2 7tf 0 t - A^git) sin 2itf 0 t 


2.3 


SOME USEFUL RANDOM VARIABLES 


P[X=1] = P P[X = 0] = l- p 
The mean and variance of this random variable are given by 

E [ X] = p 
VAR [A] = p( 1 - p) 


(2.3-1) 


(2.3-2) 
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The Binomial Random Variable 

The binomial random variable models the sum of n independent Bernoulli random 
variables with common parameter p. The PMF of this random variable is given by 

P[X = k] = Q p\ 1 - p) n ~\ k = 0, 1, . . . , n (2.3-3) 

For this random variable we have 


E [A] = np 
VAR [X] = np(\ - p ) 


(2.3-4) 


This random variable models, for instance, the number of errors when n bits are trans- 
mitted over a communication channel and the probability of error for each bit is p. 


The Uniform Random Variable 

The uniform random variable is a continuous random variable with PDF 


p(x) = 



a < x < b 
otherwise 


(2.3-5) 


where b > a and the interval [a, b] is the range of the random variable. Flere we have 

b — a 

E[X] = — (2.3-6) 

VAR [X] = (b ~ a) (2.3-7) 


The Gaussian (Normal) Random Variable 

The Gaussian random variable is described in terms of two parameters m e K and 
o > 0 by the PDF 

1 ( x-m) i 2 

p(x) = - e ^ (2.3-8) 

y/lna 2 

We usually use the shorthand form A f(m, cr 2 ) to denote the PDF of Gaussian random 
variables and write X ~ Af(m . a 2 ). For this random variable 


E [X] = m 
VAR [X] = a 2 


(2.3-9) 


A Gaussian random variable with m = 0 and a = 1 is called a standard normal. A 
function closely related to the Gaussian random variable is the Q function defined as 

i r°° 2 

Q(x) = P [MO, 1) > x] = —= / e - 2 dt (2.3-10) 

V2tt J x 
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F(x) 


P(x) 




FIGURE 2.3-1 

PDF and CDF of a Gaussian random variable. 


The CDF of a Gaussian random variable is given by 


F(x) = 


1 


it-mY 


oo \/2 


2a 1 df 


7TCT- 


= 1 
= 1 
= 1 



1 (l-m ) 2 

. e 2o 2 dt 

y/2ncr 2 

1 

e 1 du 

V 27T 



(2.3-11) 


where we have introduced the change of variable u = (t — m)/cr . The PDF and the 
CDF of a Gaussian random variable are shown in Figure 2.3-1. 

In general if X ~ AT(rn. a 2 ), then 


P[X>a]= Q 
P[X < a] = Q 


a — m 
a 

m — a 
a 


Following are some of the important properties of the Q function: 

2(0) = l - 2(o o) = o 

Q(-oo) = l Q(-x) = l - 200 

Some useful bounds for the Q function for x > 0 are 

200 < j 


Q(x)< 


l 


cY2n 


x 2 

e~ ^ 


(1 + x 2 )Y2n 


e 2 


(2.3-12) 


(2.3-13) 

(2.3-14) 


(2.3-15) 


200 > 
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FIGURE 2.3-2 

Plot of Q(x ) and its upper and lower bounds. 

From the last two bounds we conclude that for large x we have 

Q{x)^^=e~ x i (2.3-16) 

x^/2n 

A plot of the Q function bounds is given in Figure 2.3-2. Tables 2.3-1 and 2.3-2 give 
values of the Q function. 


TABLE 2.3-1 

Table of Q Function Values 


X 

Q{x) 

X 

QM 

X 

Q(x) 

X 

Q(x) 

0 

0.500000 

1.8 

0.035930 

3.6 

0.000159 

5.4 

3.3320x 10~ 8 

0.1 

0.460170 

1.9 

0.028717 

3.7 

0.000108 

5.5 

1.8990x 10 -8 

0.2 

0.420740 

2 

0.022750 

3.8 

7.2348x 10~ 5 

5.6 

1.0718x 10 -8 

0.3 

0.382090 

2.1 

0.017864 

3.9 

4.8096x 10~ 5 

5.7 

5.9904x 10 -9 

0.4 

0.344580 

2.2 

0.013903 

4 

3.1671x 10~ 5 

5.8 

3.3157x 10 -9 

0.5 

0.308540 

2.3 

0.010724 

4.1 

2.0658x 10~ 5 

5.9 

1.8175x 10 -9 

0.6 

0.274250 

2.4 

0.008198 

4.2 

1.3346x 10~ 5 

6 

9.8659x 10 -10 

0.7 

0.241960 

2.5 

0.006210 

4.3 

8.5399x 10 -6 

6.1 

5.3034x 10 -10 

0.8 

0.211860 

2.6 

0.004661 

4.4 

5.4125x 10~ 6 

6.2 

2.8232xlO" 10 

0.9 

0.184060 

2.7 

0.003467 

4.5 

3.3977xl0“ 6 

6.3 

1.4882x 10 -10 

1 

0.158660 

2.8 

0.002555 

4.6 

2.1 125x 10~ 6 

6.4 

7.7689xl0" n 

1.1 

0.135670 

2.9 

0.001866 

4.7 

1.3008x 10“ 6 

6.5 

4.0160x 10 _n 

1.2 

0.115070 

3 

0.001350 

4.8 

7.9333 x 10~ 7 

6.6 

2.0558x 10 _n 

1.3 

0.096800 

3.1 

0.000968 

4.9 

4.7918x 10~ 7 

6.7 

1.0421 xlO" 11 

1.4 

0.080757 

3.2 

0.000687 

5 

2.8665 x 10~ 7 

6.8 

5.2309x 10 -12 

1.5 

0.066807 

3.3 

0.000483 

5.1 

1.6983xl0- 7 

6.9 

2.6001 xlO" 12 

1.6 

0.054799 

3.4 

0.000337 

5.2 

9.9644 x 10- 8 

7 

1.2799x 10 -12 

1.7 

0.044565 

3.5 

0.000233 

5.3 

5.7901 x 10 -8 

7.1 

6.2378x 10 -13 
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TABLE 2.3-2 
Selected Q Function 
Values 


Q(x) 

X 

10- 1 

1.2816 

10“ 2 

2.3263 

10- 3 

3.0902 

10- 4 

3.7190 

10“ 5 

4.2649 

10- 6 

4.7534 

10“ 7 

5.1993 

0.5xl0- 5 

4.4172 

0.25xl0- 5 

4.5648 

0.667x10-3 

4.3545 


Another function closely related to the Q function is the complementary error 
function , defined as 


erfc(A) = 


s/jt , 


-* dt 


(2.3-17) 


The complementary error function is related to the Q function as follows: 

CM = ierfc (£) 
erfc(;r) = 2Q(\p2x) 

The characteristic function^ of a Gaussian random variable is given by 


(2.3-18) 


cD z (ftj) = e ja>m “5" ' 


(2.3-19) 


Problem 2.21 shows that for an A f(m, a 2 ) random variable we have 


E [(V - mf] 


1 x 3 x 5 x • • • x (2k — 1 )a 2k = for n =2k 

0 for n = 2k + 1 

(2.3-20) 


from which we can obtain moments of the Gaussian random variable. 

The sum of n independent Gaussian random variables is a Gaussian random variable 
whose mean and variance are the sum of the means and the sum of the variances of the 
random variables, respectively. 


tRecall that for any random variable X, the characteristic function is defined by = £[e J<aX ]. 

The moment generating function (MGF) is defined by &x(t) = E[e ,x ’]. Obviously, 0(f) = <J>(— jt) and 
<3>(<w) = @(ja>). 
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The Chi-Square (/ 2 ) Random Variable 

If {Xj,i = I n) are lid (independent and identically distributed) zero-mean 

Gaussian random variables with common variance a 2 and we define 


X = J2 X I 


1=1 


then X is a / 2 random variable with n degrees of freedom. The PDF of this random 
variable is given by 


p(x) = l 2" /2r (!>" 


- — 1 

X 2 l e ^ 


x > 0 

otherwise 


(2.3-21) 


where T(x) is the gamma function defined by 


rw = 


t x 1 e 1 dt, 


(2.3-22) 


The gamma function has simple poles at x = 0,-1, —2, —3, . . . and satisfies the 
following properties. The gamma function can be thought of as a generalization of the 
notion of factorial. 

T(jc + 1) = xT(x), 

T(l) = 1 


r . j I - V, 


(2.3-23) 


r|- + i| = 


(!)! 


n(n— 2){n— 4)...3x 1 

ft S+l 

2T 


n even and positive 
n odd and positive 


When n is even, i.e., n = 2rn. the CDF of the y 2 random variable with n degrees 
of freedom has a closed form given by 


m — 1 


F (x) = 




k = 0 


.0 


x > 0 
otherwise 


(2.3-24) 


The mean and variance of a y 2 random variable with n degrees of freedom are given by 


E [V] = ncr" 
VAR [X] = 2 na 4 


(2.3-25) 


The characteristic function for a y 1 random variable with n degrees of freedom is 
given by 


4>(tu) = 


1 

1 — 2 jcOC r* 


(2.3-26) 
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The special case of a / 2 random variable with two degrees of freedom is of particular 
interest. In this case the PDF is given by 


p(x) = 


r" 


x > 0 

otherwise 


(2.3-27) 


This is the PDF of an exponential random variable with mean equal to 2cr 2 . 

The x 2 random variable is a special case of a gamma random variable. A gamma 
random variable is defined by a PDF of the form 


p(x) = 


r (<*) 

0 


x > 0 

otherwise 


(2.3-28) 


where A, a > 0. A x 2 random variable is a gamma random variable with /. = and 


a = 


2 ' 


Plots of the x 2 random variable with n degrees of freedom for different values of 
n are shown in Figure 2.3-3. 


The Noncentral Chi-Square (x 2 ) Random Variable 

The noncentral x 2 random variable with n degrees of freedom is defined similarly to a 
X 2 random variable in which X,-’s are independent Gaussians with common variance 
(j - but with different means denoted by m , . This random variable has a PDF of the form 


p(x) = 





x > 0 

otherwise 


(2.3-29) 



FIGURE 2.3-3 

The PDF of the yf random variable for different values of n. All plots are shown for a = 1. 
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where 5 is defined as 


5 = 


\ 

\ '=1 


(2.3-30) 


and I a (x ) is the modified Bessel function of the first kind and order a given by 

(. x/2) a+lk 


I a (x) = X 


k = 0 


k\ r (a+k+ 1)’ 


x > 0 


(2.3-31) 


where T(x) is the gamma function defined by Equation 2.3-22. The function Ifx) can 
be written as 


2 k k\ 


Io(x) = X 

k = 0 

and for x > 1 can be approximated by 

C 

7 0 (x) ~ ~pF= 

\J2nx 

Two other expressions for 7 q(x), which are used frequently, are 


(2.3-32) 


(2.3-33) 


1 r 7 

Io(x) = - / 
7T Jo 

1 / 


/oM = — / e 
2n Jo 


± x cosip dfj) 

r 

■ xcos * df 


(2.3-34) 


The CDF of this random variable, when n = 2m, can be written in the form 

*>o 


F(x) = 

[0 otherwise 

where Q m (a, b) is the generalized Marcum Q function and is defined as 


(2.3-35) 


Qm(a, b) = 


X 

X l - 


m— 1 


e O + a ) l 2 i m _ l (ax)dx 

m ~ 1 / u\k 

= Q x (a,b) + e _( “ 2+fe2)/2 X J h(ab) 


(2.3-36) 


In Equation 2.3-36, Q 1 (a, b) is the Marcum Q function defined as 

/*°° a 2 +x 2 

Qi(a,b)= / xe 1 ~Io(ax)dx 

Jb 


or 


k=0 


a 2 +b 2 ^ > ( Cl \ 

Qi(a, b) = e 2 X! ( 7 ) b>a> 0 


(2.3-37) 


(2.3-38) 
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This function satisfies the following properties: 

Gi(*.0) = l 


<2i(0, *) = e 


(2.3-39) 


Qi(a,b) ~ Q(b — a) for b 1 and b b — a 
For a noncentral x 2 random variable, the mean and variance are given by 

E [X] = no 2 + s 2 
VAR [X] = 2ncr 4 + 4er 2 j 2 
and the characteristic function is given by 


The Rayleigh Random Variable 

If X\ and X 2 are two iid Gaussian random variables each distributed according to 
JVXO, o 2 ), then 


is a Rayleigh random variable. From our discussion of the y 1 random variables, it is 
readily seen that a Rayleigh random variable is the square root of a x 2 random variable 
with two degrees of freedom. We can also conclude that the Rayleigh random variable 
is the square root of an exponential random variable as given by Equation 2.3-27. The 
PDF of a Rayleigh random variable is given by 



(2.3-41) 



(2.3-42) 



x > 0 

otherwise 


(2.3^13) 


and its mean and variance are 


E[V] = (T 



(2.3-44) 


VAR [X] = 



a 


2 


In general, the nth moment of a Rayleigh random variable is given by 



(2.3-45) 


and its characteristic function is given by 



(2.3-46) 


Chapter Two: Deterministic and Random Signal Analysis 

where \F\(a, Ir, x ) is the confluent hypergeometric function defined by 

T{a + k)T{b)x k 


\F\ ( a , b\x) = ^2 


k=0 


T(a)T(b + k)k\ 


bfl 0 ,- 1 , - 2 ,.. 


The function i F\ (a, b\ x) can also be written as the integral 

r (b) r l 


iFi (a, b\ x) = 

r(b-a)r(a)J 0 

In Beaulieu (1990), it is shown that 

1 


e x, t a -\\-t) b - a ~ l dt 


F, =-e J E 


k=0 


(2k- 1 )k\ 


(23-A1) 


(2.3-48) 


(2.3-49) 


The CDF of a Rayleigh random variable can be easily found by integrating the 
PDF. The result is 


Fix) = 


1 — e 2 ^ 

0 


x > 0 

otherwise 


(2.3-50) 


The PDF of a Rayleigh random variable is plotted in Figure 2.3-4. 

A generalized version of the Rayleigh random variable is obtained when we have 
n iid zero-mean Gaussian random variables {X it l < i < n} where each X, has an 
flf(0, o 2 ) distribution. In this case 


X = 


, £*? 

\ «'=i 


(2.3-51) 


has a generalized Rayleigh distribution. The PDF for this random variable is given by 


p(x) = { 2 2 <r»r(a) 

0 


-e 2 » 2 


x >0 
otherwise 


(2.3-52) 
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FIGURE 2.3-4 

The PDF of the Rayleigh random variable 
for three different values of a . 
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For the generalized Rayleigh, and with n = 2m, the CDF is given by 


F U) = 


1 - e-i Em A (£) 


x >0 
otherwise 


(2.3-53) 


The At It moment of a generalized Rayleigh for any integer value of n (even or odd) is 
given by 

r ( u+k\ 

E [X*] = (2a 2 ) I (2.3-54) 


The Ricean Random Variable 

If X i and Xo are two independent Gaussian random variables distributed according to 
a 2 )and J\f{mo, a 2 ) (i.e., the variances are equal and the means may be different), 

then 


x = sjx\ + x 2 

is a Ricean random variable with PDF 


p(x) = 



x > 0 

otherwise 


(2.3-55) 


(2.3-56) 


where s = \Jm\ + m\ and U)(x) is given by Equation 2.3-32. It is clear that a Ricean 

random variable is the square root of a noncentral / 2 random variable with two degrees 
of freedom. 

It is readily seen that for 5=0, the Ricean random variable reduces to a Rayleigh 
random variable. For large 5 the Ricean random variable can be well approximated by 
a Gaussian random variable. 

The CDF of a Ricean random variable can be expressed as 


F(x) = 



x > 0 

otherwise 


(2.3-57) 


where Q\(a, b) is defined by Equations 2.3-37 and 2.3-38. 

The first two moments of the Ricean random variable are given by 



E [X 2 ] = 2a 2 + 5 2 


(2.3-58) 


where K is the Rice factor defined in Equation 2.3-60. 
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In general, the klh moment of this random variable is given by 

E[X‘]=( 2 ^)Sr(l + ‘),F,(A,l;-^) (2.3-59) 

Another form of the Ricean density function is obtained by defining the Rice factor 
K as 


K = 


2cr 2 


If we define A = s 2 + 2cr, the Ricean PDF can be written as 


P(x ) = 


M±i) xe -W+ 


2 , AK 
K+l) 


Vo (2 xj 


K(K+ 1) \ 
A ) 


x >0 

otherwise 


(2.3-60) 


(2.3-61) 


For the normalized case when A = 1 (or, equivalently, when E [A 2 ] = s 2 + 2er 2 = 1) 
this reduces to 


p(x) = 


2{K + l)*e“ ( * +1) ( A ' 2 +m)/ 0 (2 x^K(K + 1)) 
0 


x >0 

otherwise 


(2.3-62) 


A plot of the PDF of a Ricean random variable for different values of K is shown 
in Figure 2.3-5. 

Similar to the Rayleigh random variable, a generalized Ricean random variable 
can be defined as 


X = 



(2.3-63) 



FIGURE 2.3-5 

The Ricean PDF for different values of K. For small K this random variable reduces to a 
Rayleigh random variable, and for large K it is well approximated by a Gaussian random 
variable. 
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where X, ’s are independent Gaussians with mean m l and common variance a 2 . In this 
case the PDF is given by 


, x , fhr e 2.2 /„_! («) X > 0 
p(x) = { AT 2 VCT 

0 otherwise 


(2.3-64) 


and the CDF is given by 


F(x) = / 1 Qm x ~° 

0 otherwise 


(2.3-65) 


where 


5 = 


\ 

\ '=i 


The A'th moment of a generalized Ricean is given by 

r . k » 2 T ( n + k 1 1 s 2 

E [X'] = (2a 2 ) 2 e-^ t^t ' 


r(f) 


2 ’ 2 ’ 2 a 2 


(2.3-66) 


The Nakagami Random Variable 

Both the Rayleigh distribution and the Rice distribution are frequently used to describe 
the statistical fluctuations of signals received from a multipath fading channel. These 
channel models are considered in Chapters 13 and 14. Another distribution that is 
frequently used to characterize the statistics of signals transmitted through multipath 
fading channels is the Nakagami m distribution. The PDF for this distribution is given 
by Nakagami (1960) as 


P(x) = 


2 ( m\ m 2 m — l„—mx 2 /£2 

r (m) \ n) X e 

0 


x > 0 

otherwise 


(2.3-67) 


where £2 is defined as 


£2 = E [X 2 ] 


(2.3-68) 


and the parameter m is defined as the ratio of moments, called the fading figure. 


£ 2 - 


(X 2 - £2)^ 


m > - 
~ 2 


(2.3-69) 


A normalized version of Equation 2.3-67 may be obtained by defining another 
random variable Y = X/^£2 (see Problem 2.42). The nth moment of X is 


E [X"] 


r (m + I) /£2y /2 

r(m) v 


m 


(2.3-70) 
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The mean and the variance for this random variable are given by 


E[X] = 


T(,: 


T(m) \m 




1/2 


VAR [X] = 


T(m) 


m 


(2.3-71) 


By setting m = 1, we observe that Equation 2.3-67 reduces to a Rayleigh PDF. 
For values of m in the range \ <m < 1, we obtain PDFs that have larger tails than a 
Rayleigh-distributed random variable. For values of m > 1, the tail of the PDF decays 
faster than that of the Rayleigh. Figure 2.3-6 illustrates the Nakagami PDF for different 
values of m . 



FIGURE 2.3-6 

The PDF for the Nakagami m distribution, shown with = 1. m is the fading figure. 
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The Lognormal Random Variable 

Suppose that a random variable Y is normally distributed with mean m and variance a 2 . 
Let us define a new random variable X that is related to Y through the transformation 
Y = In X (or V = e Y ). Then the PDF of V is 


p(x) = 



-(In x—m) 2 / 2<t 2 


x >0 
otherwise 


For this random variable 


E[X\ = e m+ ^ 

VAR [X] = e 2m+a2 (e° 2 - l) 


(2.3-72) 


(2.3-73) 


The lognormal distribution is suitable for modeling the effect of shadowing of the 
signal due to large obstructions, such as tall buildings, in mobile radio communications. 
Examples of the lognormal PDF are shown in Figure 2.3-7. 


Jointly Gaussian Random Variables 

An n x 1 column random vector X with components { X, , 1 < i < n) is called a 
Gaussian vector , and its components are called jointly Gaussian random variables or 



FIGURE 2.3-7 

Lognormal PDF with a = 1 for different values of m . 
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multivariate Gaussian random variables if the joint PDF of X/'s can be written as 


P 00 = 


—\(x—m)'C l (x—m) 

(2^-)"/2(detC) 1 /2 


(2.3-74) 


where m and C are the mean vector and covariance matrix, respectively, of X and are 
given by 


m = E [X] 

C = E [(X - m)(X - m) T ] 
From this definition it is clear that 


Cij = COV [Xi, Xj] 


(2.3-75) 


(2.3-76) 


and therefore C is a symmetric matrix. From elementary probability it is also well 
known that C is nonnegative definite. 

In the special case of n = 2, we have 



m i 

m = 

pi2_ 

C = 



pert 02 


po 102 


(2.3-77) 


where 

COV[X,, X 2 ] 

P = 

(71(72 

is the correlation coefficient of the two random variables. In this case the PDF 
reduces to 

i (ppY<ppYMpp)(^p) 

p(x \,x 2 ) = , e vyGb (2.3-78) 

2tZ0\0 2 \/\ - p- 

where m i, m 2 , crp and, a} are means and variances of the two random variables and p 
is their correlation coefficient. Note that in the special case when p = 0 (i.e., when the 
two random variables are uncorrelated), we have 

p(x i, Xi) = N (pi 1, (7j 2 ) X J\f (ni2, Oj) 

This means that the two random variables are independent, and therefore for this case 
independence and uncorrelatedness are equivalent. This property is true for general 
jointly Gaussian random variables. 

Another important property of jointly Gaussian random variables is that linear 
combinations of jointly Gaussian random variables are also jointly Gaussian. In other 
words, if X is a Gaussian vector, the random vector Y = AX, where the invertible 
matrix A represents a linear transformation, is also a Gaussian vector whose mean and 
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covariance matrix are given by 


m Y = Am x 
Cy = AC X A 1 


(2.3-79) 


This property is developed in Problem 2.23. 

In summary, jointly Gaussian random variables have the following important 
properties: 

1. For jointly Gaussian random variables, uncorrelated is equivalent to independent. 

2. Linear combinations of jointly Gaussian random variables are themselves jointly 
Gaussian. 

3. The random variables in any subset of jointly Gaussian random variables are jointly 
Gaussian, and any subset of random variables conditioned on random variables in 
any other subset is also jointly Gaussian (all joint subsets and all conditional subsets 
are Gaussian). 

We also emphasize that any set of independent Gaussian random variables is jointly 
Gaussian, but this is not necessarily true for a set of dependent Gaussian random 
variables. 

Table 2.3-3 summarizes some of the properties of the most important random 
variables. 


■ 2.4 

BOUNDS ON TAIL PROBABILITIES 

Performance analysis of communication systems requires computation of error proba- 
bilities of these systems. In many cases, as we will observe in the following chapters, 
the error probability of a communication system is expressed in terms of the probability 
that a random variable exceeds a certain value, i.e., in the form of P [X > a ]. Unfortu- 
nately, in many cases these probabilities cannot be expressed in closed form. In such 
cases we are interested in finding upper bounds on these tail probabilities. These upper 
bounds are of the form P [X > a] < p. In this section we describe different methods 
for providing and tightening such bounds. 

The Markov Inequality 

The Markov inequality gives an upper bound on the tail probability of nonnegative 
random variables. Let us assume that X is a nonnegative random variable, i.e., p(x) = 0 
for all x < 0, and assume a > 0 is an arbitrary positive real number. The Markov 
inequality states that 


P [X > a] < 


E[X] 


a 


(2.4-1) 


TABLE 2.3-3 

Properties of Important Random Variables 
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To see this, we observe that 

pOO 

E [X] = / xp(x)dx 
Jo 

pO O 

> / xp(x)dx 
J a 

poo 

> a xp(x)dx 

J a 

= aP[X>a] 

Dividing both sides by a gives the desired inequality. 


(2.4-2) 


Chernov Bound 

The Chernov bound is a very tight and useful bound that is obtained from the Markov 
inequality. Unlike the Markov inequality that is applicable only to nonnegative random 
variables, the Chernov bound can be applied to all random variables. 

Let X be an arbitrary random variable, and let 8 and v be arbitrary real numbers 
(v 7 ^ 0). Define random variable Y by Y = e vX and constant a by a = e vS . Obviously, 
Y is a nonnegative random variable and a is a positive real number. Applying the 
Markov inequality to Y and a yields 

P [e vX > e vS ] < ^ = E [e v(X ~ S) ] (2.4-3) 

The event { e vX > e vS ] is equivalent to the event {vX > v8) which for positive or 

negative values of v is equivalent to {X > 6 } or {A < 5}, respectively. Therefore we 

have 

P [X > < E [e v(X ~ S) ] , for all v > 0 (2.4-4) 

P [X < 5] < E [ e v(X ~ S) ] , for all v < 0 (2.4-5) 

Since the two inequalities are valid for all positive and negative values of v, re- 
spectively, it makes sense to find the values of v that give the tightest possible bounds. 
To this end, we differentiate the right hand of the inequalities with respect to v and 
find its root; this is the value of v that gives the tightest bound. From this point on, 
we will consider only the first inequality. The extension to the second inequality is 
straightforward. 

Let us define function g(v) to denote the right side of the inequalities, i.e., 

g(v) = E [e v(x - S) ] 

Differentiating g(v), we have 

g'(v) = E [(X - 8)e v{x ~ S) ] 

The second derivative of g(v) is given by 

g"(v) = E [(A - <5) 2 e v(X ~ S) ] 


(2.4-6) 
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It is easily seen that for all v, we have g"(v) > 0 and hence g(v) is convex and g'(f) is 
an increasing function, and therefore can have only one root. In addition, since g(v) is 
convex, this single root minimizes g(v) and therefore results in the the tightest bound. 
Putting g'(v) = 0, we find the root to be obtained by solving the equation 

E [Xe vX ] = 8 E [e vX ] (2.4-7) 

Equation 2.4-7 has a single root v* that gives the tightest bound. The only thing that 
remains to be checked is to see whether this v* satisfies the v* > 0 condition. Since g'(v) 
is an increasing function, its only root is positive if g'(0) < 0. From Equation 2.4-6 we 
have 


g'(0) = E[X]-8 


therefore v* > 0 if and only if 8 > E [ X ]. 

Summarizing, from Equations 2.4—4 and 2.4-5 we conclude 


P[X > 5] < e 
P[X < 5] < e 




-V*S I 


V*X 


v*X 


for 5 > E [X] 
for 8 < E [X] 


(2.4-8) 

(2.4-9) 


where v* is the solution of Equation 2.4-7. Equations 2.4-8 and 2.4-9 are known as 
Chernov bounds. Finding optimal v* by solving Equation 2.4-7 is sometimes difficult. 
In such cases a numerical approximation or an educated guess gives a suboptimal 
bound. The Chernov bound can also be given in terms of the moment generating 
function (MGF) 0 x (v) = E [e vX ] as 

P [X > ,5] < e- v * s @ x (v% for 8 > E [X] (2.4-10) 

P [X < <5] < e~ v ’ s @ x (v*), for 8 < E [X] (2.4-1 1) 

example 2.4-1. Consider the Laplace PDF given by 

p(x) = * (2.4-12) 

Let us evaluate the upper tail probability P [X > 5] for some S > 0 from the Chernov 
bound and compare it with the true tail probability, which is 

r°° i i 

P [X>8]= -e~ x dx = -e~ & (2.4-13) 

Js 2 2 


First note that E [X] = 0, and therefore the condition 8 > E [X] needed to use the 
upper tail probability in the Chernov bound is satisfied. To solve Equation 2.4-7 for v*, 
we must determine E [Xe 1 ’^ j and E [e vX ] . For the PDF in Equation 2.4-12, we find 
that E [Xe 1 ^] and E [e vX ] converge only if — 1 < v < 1, and for this range of values 
of v we have 


E [Xe vZ ] 


2v 

(v+ l) 2 (v - l) 2 

1 


(2.4-14) 


(l + v)(l- v) 
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Substituting these values into Equation 2.4-7, we obtain the quadratic equation 

v 2 8 + 2v — S = 0 


which has the solutions 


, -i± VTT 

V* = 


(2.4-15) 


Since v* must be in the (—1, +1) interval for E [Xe 1 ^] and E [e vX ] to converge, the 
only acceptable solution is 


* - 1 + V i + s~ 

v = 


(2.4-16) 


Finally, we evaluate the upper bound in Equation 2.4-8 by substituting for v* from 
Equation 2.4-16. The result is 


P[X > 8] < 


2(-l + yiT52) 


l-CT+s 1 


(2.4-17) 


For <5 1, Equation 2.4-17 reduces to 


8 , 

P{X >8)< -e~ s (2.4-18) 

We note that the Chernov bound decreases exponentially as S increases. Consequently, 
it approximates closely the exact tail probability given by Equation 2.4-13. 

example 2.4-2. In performance analysis of communication systems over fading chan- 
nels, we encounter random variables of the form 

X = cl 2 R 2 + 2 RdN (2.4-19) 

where d is a constant, R is a Ricean random variable with parameters s and a represent- 
ing channel attenuation due to fading, and IV is a zero-mean Gaussian random variable 
with variance ^ representing channel noise. It is assumed that R and N are indepen- 
dent random variables. We are interested to apply the Chernov bounding technique to 
find an upper bound on P [X < 0]. From the Chernov bound given in Equation 2.4-5, 
we have 


P [X < 0] < E [e vX ] , for all v < 0 (2.4-20) 

To determine E [e vX ] . we use the well-known relation 


E [T] = E [E [T|X]] 


(2.4-21) 


from elementary probability. We note that conditioned on R, X is a Gaussian random 
variable with mean d 2 R 2 and variance 2R 2 d 2 N (> . Using the relation for the moment 
generating function of a Gaussian random variable from Table 2.3-3, we have 


E [e vX \R] 


_ e vd 1 R 1 +v 2 d 2 Nt)R ' 1 


,vd 2 (\+N 0 v)R 2 


(2.4-22) 
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Now noting that R 2 is a noncentral x 2 random variable with two degrees of freedom, 
and using the characteristic function for this random variable from Table 2.3-3, we 
obtain 


E[e vX ]=E[E [ e vX \R] } 


= E 


e vd 2 (l+N 0 v)R 2 


1 

1 - 2vd 2 (\ + N 0 v)a 2 


vd 2 (l-t-Afgv)s 2 
£ 1-2 v</ 2 (1+A/ 0 v)ct 2 


(2.4-23) 


where we have used Equation 2.4-21. From Equations 2.4-20 and 2.4-23 we conclude 
that 


| vd 2 (l+NQV)s 2 

P [X < 0] < min e | - 2 ”<' 2(l +"o”)" 2 (2.4-24) 

u<o 1 — 2vri 2 (l + Nqv)o 2 

It can be easily verified by differentiation that in the range of interest (v < 0), the right- 
hand side is an increasing function of X = vd 2 ( I + Nqv), and therefore the minimum 
is achieved when X is minimized. By simple differentiation we can verify that X is 
minimized for v = — , resulting in 


P [X < 0] < 






r 2JV„" 


(2.4-25) 


If we use Equation 2.3-61 or 2.3-62 for the Ricean random variable, we obtain the 
following bounds: 


P [X < 0] < 


K + l 


L d 2 A 2 


K + 1 


A 2 d 2 

4V 0 


and 


(2.4-26) 


P[X < 0] < 


K+l 

K + ' + W 0 


- e 4w 0 


(2.4-27) 


For the case of Rayleigh fading channels, in which s = 0, these relations reduce to 

1 


P [X < 0] < 


1 + m-y 


(2.4-28) 


Chernov Bound for Sums of Random Variables 

Let { AC, } , 1 < i < n, denote a sequence of iid random variables and define 
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We are interested to find a bound on P [Y > <5], where <5 > E [A]. Applying the Chernov 
bound, we have 


P[T > 5] = P 


< E 


x i > nS 


i = 1 


XTL*-**) 


(2.4-30) 


= [E[e v ^]] n , v > 0 

To find the optimal choice of v we equate the derivative of the right-hand side to 


zero 


— [E [e v< ' X ~ S) ]] n = n [E [e v(x ~ S) ]]" 1 E [(X - 8)e v(x ~ S) ] = 0 (2.4-31) 


The single root of this equation is obtained by solving 

E [Xe vX ] =8 E [e vX ] 


(2.4-32) 


which is exactly Equation 2.4-7. Therefore, for the sum of iid random variables we 
find the v* solution of Equation 2.4-7, and then we use 


P [T > <5] < 




= e 


-nv*8 


v*X 


(2.4-33) 


example 2.4-3. The X, ’s are binary iid random variables with P[X = 1] = 1 
P [X — — 1] = p, where p < 2 . We are interested to find a bound on 


E*'>° 


We have E [X] = p — (1 — p) = 2p— 1 <0. Assuming 8 = 0, the condition <5 > E [A] 
is satisfied, and the preceding development can be applied to this case. We have 

E [Xe vx ] = pe v - (1 - p)e~ v 


and Equation 2.4-7 becomes 


pe v - (1 - p)e~ v = 0 


which has the unique solution 

Using this value, we have 
E [^**1 = p 


* 1 , 1 - P 

v = - In 

2 p 


1 ~P 


+ (1 - p)\l YZTp = 2 v / p( 1 ~ p) 


Substituting this result into Equation 2.4-33 results in 


E^>° 


< [4p(l - p)]i 


(2.4-34) 

(2.4-35) 

(2.4-36) 

(2.4-37) 

(2.4-38) 
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Since for p < \ we have Ap{\ — p) < 1, the bound given in Equation 2.4-38 tends to 
zero exponentially. 


■ 2.5 

LIMIT THEOREMS FOR SUMS OF RANDOM VARIABLES 

If { Xj , i = 1, 2, 3, ... } represents a sequence of iid random variables, then it is intu- 
itively clear that the running average of this sequence, i.e., 

1 " 

Y„ = - V X[ (2.5-1) 

n tr 

should in some sense converge to the average of the random variables. Two limit 
theorems, i.e., the law of large numbers (LLN) and the central limit theorem (CLT), 
rigorously state how the running average of the random variable behaves as n becomes 
large. 

The (strong) law of large numbers states that if {Xj,i = 1 , 2, . . . } is a sequence of 
iid random variables with E [Ai] < oo, then 

E[X x ] (2.5-2) 

n t! 

where the type of convergence is convergence almost everywhere (a.e.) or convergence 
almost surely (a.s.), meaning the set of points in the probability space for which the 
left-hand side does not converge to the right-hand side has zero probability. 

The central limit theorem states that if {A,, i = 1, 2, . . . } is a sequence of iid 
random variables with m = E [ Ai ] < oo and a 2 = VAR [AJ < oo, then we have 

1 E/-i Xi - m 

” » MO, 1) (2.5-3) 

The type of convergence in the CLT is convergence in distribution, meaning the CDF 
of the left-hand side converges to the CDF of 1) as n increases. 


■ 2.6 

COMPLEX RANDOM VARIABLES 

A complex random variable Z = A + jY can be considered as a pair of real random 
variables A and Y. Therefore, we treat a complex random variable as a two-dimensional 
random vector with components A and Y. The PDF of a complex random variable is 
defined to be the joint PDF of its real and complex parts. If A and Y are jointly 
Gaussian random variables, then Z is a complex Gaussian random variable. The PDF 
of a zero-mean complex Gaussian random variable Z with iid real and imaginary parts 
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is given by 



1 


( 2 . 6 - 1 ) 


LTZO 



( 2 . 6 - 2 ) 


For a complex random variable Z, the mean and variance are defined by 


E[Z] = E[X] + jEm 

VAR [Z] = E [|Z| 2 ] - |E [Z] | 2 = VAR [X] + VAR [F] 


( 2 . 6 - 3 ) 

( 2 . 6 - 4 ) 


2.6-1 Complex Random Vectors 

A complex random vector is defined as Z = X + jY, where X and Y are real- valued 
random vectors of size n. We define the following real-valued matrices for a complex 
random vector Z. 


Matrices Cx and Cy are the covariance matrices of real random vectors X and Y , 
respectively, and hence they are symmetric and nonnegative definite. It is clear from 
above that Cyx = C T X y 

The PDF of Z is the joint PDF of its real and imaginary parts. If we define the 
2«-dimcnsional real vector 


then the PDF of the complex vector Z is the PDF of the real vector Z. It is clear that 
C /. the covariance matrix of Z, can be written as 



We also define the following two, in general complex-valued, matrices 


where A' denotes the transpose and A H denotes the Hermitian transpose of A (A is 
transposed and each element of it is conjugated). Cz and C z are called the covariance 
and the pseudocovariance of the complex random vector Z, respectively. It is easy to 


C x = E[(X-E[X])(X-E[X]y] 
C Y = E [(Y - E[Y])(Y - E[Y])’] 
Cxy = E [(X - E[X])(Y - E[Y])'] 
Cyx = E [(Y - E[Y])(X - E[X])'] 


( 2 . 6 - 5 ) 

( 2 . 6 - 6 ) 

( 2 . 6 - 7 ) 

( 2 . 6 - 8 ) 


Z = 


X 

Y 


( 2 . 6 - 9 ) 


Cz = E[(Z-E[Z])(Z-E[Z]) h ] 
C z = E[(Z- E[Z])(Z- E[Z]) f ] 


( 2 . 6 - 11 ) 

( 2 . 6 - 12 ) 
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verify that for any Z, the covariance matrix is Hermitian ^ and nonnegative definite. The 
pseudocovariance is skew -Hermitian. 

From these definitions it is easy to verify the following relations. 


C z = C x + Cy + j (Cyx — C xy) (2.6-13) 

C z = C x ~ C Y + j ( Cxy + Cyx) (2.6-14) 

C x = l - Re [C z + C z \ (2.6-15) 

C Y = *Re[C z -C z ] (2.6-16) 

Cyx = \ Im \C Z + C z ] (2.6-17) 

Cxy = \ Im [Cz ~ C z \ (2.6-18) 


Proper and Circularly Symmetric Random Vectors 

A complex random vector Z is called proper if its pseudocovariance is zero, i.e., if 
C z = 0. From Equation 2.6-14 it is clear that for a proper random vector we have 

C x = C Y (2.6-19) 

Cxy = -Cyx (2.6-20) 


Substituting these results into Equations 2.6-13 to 2.6-18 and 2.6-10, we conclude 
that for proper random vectors 


Cz = 2C x + ^-JCyx 
C x = Cy = - Re [Cz] 

Cyx = —Cxy = -Im[C z ] 



( 2 . 6 - 21 ) 

( 2 . 6 - 22 ) 

(2.6-23) 

(2.6-24) 


For the special case of n = 1, i.e., when we are dealing with a single complex 
random variable Z = X + jY, the conditions for being proper become 


VAR [ X ] = VAR [Y] 

COV [V, Y] = -COV [Y, X] 


(2.6-25) 

(2.6-26) 


which means that Z is proper if X and Y have equal variances and are uncorrelated. In 
this case VAR [Z] = 2 VAR [A]. Since in the case of jointly Gaussian random variables 
uncorrelated is equivalent to independent, we conclude that a complex Gaussian random 


tMatrix A is Hermitian if A = A H . It is skew-Hermitian if A H 


= -A. 
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variable Z is proper if and only if its real and complex parts are independent with equal 
variance. For a zero-mean proper complex Gaussian random variable, the PDF is given 
by Equation 2.6-2. 

If the complex random vector Z = X + jY is Gaussian, meaning that X and Y 
are jointly Gaussian, then we have 


p(z ) = p(z) = _ e -ka-m)'c i -\z-m) (2.6-27) 

(2jr)' , (detC 2 )i 


where 


in = E [Z] (2.6-28) 

It can be shown that in the special case where Z is a proper n-dimensional complex 
Gaussian random vector, with mean m = E [Z] and nonsingular covariance matrix Cz, 
its PDF can be written as 

p(z)= e -\(z-mYc z \z-m) (2.6-29) 

it" det Cz 

A complex random vector Z is called circularly symmetric or circular if rotating the 
vector by any angle does not change its PDF. In other words, a complex random vector 
Z is circularly symmetric if Z and e ,f1 Z have the same PDF for all 6. In Problem 2.34 
we will see that if Z is circular, then it is zero-mean and proper, i.e., E [Z] = 0 and 
E [ZZ'] = 0 . In Problem 2.35 we show that if Z is a zero-mean proper Gaussian 
complex vector, then Z is circular. In other words, for complex Gaussian random 
vectors being zero-mean and proper is equivalent to being circular. 

In Problem 2.36 we show that if Z is a proper complex vector, then any affine 
transformation of it, i.e., any transform of the form W = AZ + b, is also a proper 
complex vector. Since we know that if Z is Gaussian, so is W, we conclude that if Z is 
a proper Gaussian vector, so is W. For more details on properties of proper and circular 
random variables and random vectors, the reader is referred to Neeser and Massey 
(1993) and Eriksson and Koivunen (2006). 


■ 2.7 

RANDOM PROCESSES 

Random processes, stochastic processes, or random signals are fundamental in the study 
of communication systems. Modeling information sources and communication chan- 
nels requires a good understanding of random processes and techniques for analyzing 
them. We assume that the reader has a knowledge of the basic concepts of random 
processes including definitions of mean, autocorrelation, cross-correlation, stationar- 
ity, and ergodicity as given in standard texts such as Leon-Garcia (1994), Papoulis and 
Pillai (2002), Stark and Woods (2002). In the following paragraphs we present a brief 
review of the most important properties of random processes. 
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The mean m x (t ) and the autocorrelation function of a random process X(t) are 
defined as 


m x (t) = E[X(t)] (2.7-1) 

R x (t l ,t 2 ) = E[X(t l )X*(t 2 )\ (2.7-2) 

The cross-correlation function of two random processes X(t ) and Y (t ) is defined by 

R XY (t u t 2 ) = E[X( tl )Y*(t 2 )\ (2.7-3) 

Note that R x (h , t\ ) = R x (h, h), i.e., R x (t\ ■ t 2 ) is Hermitian. For the cross-correlation 
we have R YX (t 2 , h) = R* XY (t u t 2 ). 


2.7-1 Wide-Sense Stationary Random Processes 

Random process X(t) is wide-sense stationary (WSS) if its mean is constant and 
R x {h ^ h ) = R x (r) where r = ti — t 2 . For WSS processes R x {—x) = R x (r). Two 
processes X(t) and Y(t) are jointly wide-sense stationary if both X(t) and Y(t) are 
WSS and R XY (t i, t 2 ) = R xy (t). For jointly WSS processes R yx {—t) = R xy ( r). A 
complex process is WSS if its real and imaginary parts are jointly WSS. 

The power spectral density (PSD) or power spectrum of a WSS random process 
X(t) is a function S x (f ) describing the distribution of power as a function of frequency. 
The unit for power spectral density is watts per hertz. The Wiener- Khinchin theorem 
states that for a WSS process, the power spectrum is the Fourier transform of the 
autocorrelation function R x {r), i.e., 

S x (f) = dX[R x (r)] (2.7-4) 

Similarly, the cross spectral density (CSD) of two jointly WSS processes is defined as 
the Fourier transform of their cross-correlation function. 


Sxrif) = d?[R XY (T)] (2.7-5) 

The cross spectral density satisfies the following symmetry property: 

S XY (f) = S* YX (f ) (2.7-6) 

From properties of the autocorrelation function it is easy to verify that the power 
spectral density of any real WSS process X(t) is areal, nonnegative, and even function of 
/. For complex processes, power spectrum is real and nonnegative, but not necessarily 
even. The cross spectral density can be a complex function, even when both X(t) and 
Y(t) are real processes. 

If X(t) and Y(t) are jointly WSS random processes, then Z(f) = aX(t) + bY(t) is 
a WSS random process with autocorrelation and power spectral density given by 

R z (r) = \a\ 2 R x (r) + \b\ 2 R Y (r) + ab*R XY (r ) + ba*R YX ( r) 

S z (f) = \a\ 2 S x (f) + \b\ 2 S y (f) + 2 Re [ab*S XY (fj] 


(2.7-7) 

(2.7-8) 
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In the special case where a = b = 1, we have Z{t) = X(t) + y(f), which results in 


Rz( r) = Rx(t) + Ry( r) + R XY ( r) + R YX ( r) (2.7-9) 

«S Z (/) = 5 X (/) + S Y (f) + 2 Re [, S XY (f )] (2.7-10) 

and when a = 1 and b = j, we have Z(/) = X(t) + jY(t) and 

«z(r) = R x ( r) + Ry(r) + j (R YX ( r) + /?x F (r)) (2.7-1 1) 

«Sz(/) = Sx(/) + 5r(/) + 2 Im [5 zy (/)J (2.7-12) 


When a WSS process X(t) passes through an LTI system with impulse response 
h(t ) and transfer function H(f ) = dF[/i(r)], the output process Y(t) and X(t) are 


jointly WSS and the following relations hold: 

/ OO 

h{t)dt (2.7-13) 

-OO 

R XY (t) = R x (r)*h*(-r) (2.7-14) 

R Y ( r) = R x ( r) ★ h{x) ★ /t*(-r) (2.7-15) 

77i y = ?n x H (0) (2.7-16) 

Sxr(f) = S x (f)H*(f ) (2.7-17) 

5 r (/) = 5 x (/)|//(/)| 2 (2.7-18) 

The power in a WSS process X(f) is the sum of the powers at all frequencies, and 
therefore it is the integral of the power spectrum over all frequencies. We can write 

/ OO 

S x (f)df (2.7-19) 

-OO 


Gaussian Random Processes 

A real random process X(t) is Gaussian if for all positive integers n and for all 
(t\,t 2 , . . t„), the random vector {X (fi), Xfe), . . . , X (t n ))' is a Gaussian random vec- 
tor; i.e., random variables {Z(t,)}" =1 are jointly Gaussian random variables. Similar 
to jointly Gaussian random variables, linear littering of Gaussian random processes 
results in a Gaussian random process, even when the filtering is time -varying. 

Two real random processes X(t) and Y (t) are jointly Gaussian if for all positive 
integers n , m and all (t\. 1 2 , , t n ), and (t[ , t' 2 , . . . , t' m ), the random vector 

(X(fi), X(t 2 ), . . . , X(t n ), Y(t[), F(4), . . . , Y(t'jy 

is a Gaussian vector. For two jointly Gaussian random processes X(t) and Y (t ), being 
uncorrelated, i.e., having 

R XY (t + r, t) = E[X(t + r)]E[y(t)] for all t and r (2.7-20) 

is equivalent to being independent. 

A complex process Z(t) = X(t) + jY(t) is Gaussian if X(t) and Y(t) are jointly 
Gaussian processes. 
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White Processes 

A process is called a white process if its power spectral density is constant for all 
frequencies; this constant value is usually denoted by 


Using Equation 2.7-19, we see that the power in a white process is infinite, indicating 
that white processes cannot exist as a physical process. Although white processes are not 
physically realizable processes, they are very useful, closely modeling some important 
physical phenomenon including the thermal noise. 

Thermal noise is the noise generated in electric devices by thermal agitation of 
electrons. Thermal noise can be closely modeled by a random process N(t) having the 
following properties: 

1. N(t) is a stationary process. 

2. N(t) is a zero-mean process. 

3. N(t) is a Gaussian process. 

4. N(t) is a white process whose power spectral density is given by 


where T is the ambient temperature in kelvins and k is Boltzmann ’s constant, equal 
to 38 x 1(T 23 J/K. 

Discrete-Time Random Processes 

Discrete-time random processes have similar properties to continuous time processes. 
In particular the PSD of a WSS discrete-time random process is defined as the discrete- 
time Fourier transform of its autocorrelation function 


and the autocorrelation function can be obtained as the inverse Fourier transform of the 
power spectral density as 



(2.7-21) 



(2.7-22) 


OO 



(2.7-23) 



1/2 


(2.7-24) 


1/2 


The power in a discrete-time random process is given by 



1/2 


(2.7-25) 
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2.7-2 Cyclostationary Random Processes 

A random process X(t) is cyclostationary if its mean and autocorrelation function are 
periodic functions with the same period Tq. For a cyclostationary process we have 

m x (t + T 0 ) = m x (t) (2.7-26) 

Rx(h + Tq, ti + To) = R x (ti, t{) (2.7-27) 

Cyclostationary processes are encountered frequently in the study of communi- 
cation systems because many modulated processes can be modeled as cyclostationary 
processes. For a cyclostationary process, the average autocorrelation function is defined 
as the average of the autocorrelation function over one period 

~RA^> = ^r [° Rx(t + r, t ) dt (2.7-28) 

to Jo 

The (average) power spectral density for a cyclostationary process is defined as the 
Fourier transform of the average autocorrelation function, i.e., 


S x (f) = ^{Rxi r)] (2.7-29) 

example 2.7-1. Let {a,,} denote a discrete-time WSS random process with mean 
m a (n) = E[n„] = to a and autocorrelation function R a (m ) = E [a, !+m a*] . Define the 
random process 


OO 

X(t)= J2 a n g(t-nT) (2.7-30) 

n =— oo 

for an arbitrary deterministic function g(t). We have 

OO 

m x (t) = E [X(f)l = m a Y, S(t ~ nT) (2.7-31) 

n =— oo 

This function is obviously periodic with period T . For the autocorrelation function we 
have 

OO OO 

R x {t + x,t)= Y Y, K[a n a* m ]g(t + x-nT)g*(t-mT) (2.7-32) 

n=—oo m =— oo 
oo oo 

= Yj 'Y R a(n — m)g{t + t — nT)g*{t — mT) (2.7-33) 

n = — oo m=—o o 

It can readily be verified that 

R x (t + r + T, t + T) = R x (t + r, t ) (2.7-34) 

Equations 2.7-31 and 2.7-34 show that X(t ) is a cyclostationary process. 
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2.7- 3 Proper and Circular Random Processes 

For a complex random process Z(f) = X(t) + j Y (/), we define the covariance and the 
pseudocovariance, similar to the case of complex random vectors, as 

C z (t + z, t) = E [Z(f + r)Z*(f)] (2.7-35) 

C z (t + t, t) = E \Z(t + r)Z(f)] (2.7-36) 

It is easy to verify that similar to Equations 2.6-13 and 2.6-14, we have 

Cz(t + r , t) = Cx(t + r , 0 + CV(f + t, f) + j (CVx(f + t, 0 — Cxy(t + r, f )) 

(2.7-37) 

Cz(l + t, i) = Cx(f + r, f) — Cy(i + r, t) + j (Cyx(t + t, f) + Cxy(t + ri 0) 

(2.7-38) 

A complex random process Z(f) is proper if its pseudocovariance is zero, i.e., 
Cz(t + r,t) = 0. For a proper random process we have 

C x (t + r, t) = C Y (t + r, t) (2.7-39) 

Cyx(t + r ,t) = —Cxy(t + r, f) (2.7-40) 

and 

C z (t + r , t) = 2 Cx(t + r, t) 4- j2Cy X {t + r, f) (2.7—41) 

If Z(f ) is a zero-mean process, then all covariances in Equations 2.7-35 to 

2.7- 41 are substituted with auto- or cross-correlations. When Z(t) is WSS, all auto- 
and cross-correlations are functions of r only. A proper Gaussian random process is a 
random process for which, for all n and all (t \ , t2, ■ ■ . , t n ), the complex random vector 
(Z(fi), Z(f>), . . . , Z(f„))' is a proper Gaussian vector. 

A complex random process Z{t) is circular if for all 9, Z(t) and e J " Z(t) have the 
same statistical properties. Similar to the case of complex vectors, it can be shown 
that if Z(t) is circular, then it is both proper and zero-mean. For the case of Gaussian 
processes, being proper and zero-mean is equivalent to being circular. Also similar to 
the case of complex vectors, passing a circular Gaussian process through a linear (not 
necessarily time-invariant) system results in a circular Gaussian process at the output. 


2.7-4 Markov Chains 

Markov chains are discrete-time, discrete-valued random processes in which the current 
value depends on the entire past values only through the most recent values. In a jth- 
order Markov chain, the current value depends on the past values only through the most 
recent j values, i.e., 

P [2G — V/7 | X n — \ — X n — l, X n —2 — Xn— 2, • • ■ ] 

= P [2f 7J = x n = x n _i, X n —2 = x n — 2 , - • ■ , X n —j = x n —j ] (2.7—42) 
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It is convenient to consider the set of the most recent j values as the state of the 
Markov chain. With this definition the current state of the Markov chain, 
i.e., S„ = ( X n , , X n -j + 1 ), depends only on the most recent state S n -\ = 

(X n _!, X n _ 2 , . . . , X n _j). That is, 


which represents a first-order Markov chain in terms of the state variable S „ . Note that 
with this notation, X„ is a deterministic function of state S„ . We can generalize this 
notion to the case where the state evolves according to Equation 2.7-43 but the output — 
or the value of the random process X „ — depends on state S„ through a conditional 
probability mass function 


With this background, we define a Markov chain' as a finite-state machine with 
state at time n, denoted by S „ , taking values in the set {1, 2, ... , 5} such that Equation 
2.7-43 holds and the value of the random process at time n, denoted by X„ and taking 
values in a discrete set, depends statistically on the state through the conditional PMF 
P [ X n = x„ | S„ = s n ] . 

The internal development of the process depends on the set of states and the proba- 
bilistic law that governs the transitions between the states. If P [S„ \ S„-\ ] is independent 
of n (time), the Markov chain is called homogeneous. In this case the probability of 
transition from state i to state / . 1 < i, j < S, is independent of n and is denoted 


In a homogeneous Markov chain, we define the state transition matrix, or one- 
step transition matrix, P as a matrix with elements P,j . The element at row i and 
column j denotes the probability of a direct transition from state i to state j. P is a 
matrix with nonnegative elements, and the sum of each row of it is equal to 1. The 
n-step transition matrix gives the probabilities of moving from i to j in n steps. For 
discrete-time homogeneous Markov chains, the /7-step transition matrix is equal to P n . 
All Markov chains studied here are assumed to be homogeneous. 

The row vector pin ) = [ p\(n) poin) • • • , ps{n ) |, where p;{n) denotes the prob- 
ability of being in state i at time n, is the state probability vector of the Markov chain 
at time n. From this definition it is clear that 



P I X r . x n N Sfi ] 


(2.7-44) 


by P,j 


Pij = P [Sn = j |Sn— t = i] 


(2.7-45) 


Pin) = p(n - 1 )P 


(2.7-46) 


and 


pin) = piO)P" 


(2.7-47) 


tStrictly speaking, this is the definition of a finite-state Markov chain (FSMC), which is the only class of 
Markov chains studied in this book. 
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If Hindoo P" exists and all its rows are equal, we denote each row of the limit by 
/hie., 


In this case 


lim P n 


P 

P 


IP A 


lim p(n) = lim p(0)P" = p( 0) 

n—>oo n— >oo 


p 

p 


= p 


lpa 


(2.7-48) 


(2.7-49) 


This means that starting from any initial probability vector p( 0), the Markov chain 
stabilizes at the state probability vector given by p, which is called the steady-state, 
equilibrium, or stationary state probability distribution of the Markov chain. Since after 
reaching the steady-state probability distribution these probabilities do not change, p 
can be obtained as the solution of the equation 


pP = p 


(2.7-50) 


that satisfies the conditions Pi > 0 and Pi = 1 (he., it is a probability vector). If a 
Markov chain starts from state p, then it will always remain in this state, because pP = 
p. Some basic questions are the following: Does pP = p always have a solution that is 
a probability vector? If yes, under what conditions is this solution unique? Under what 
conditions does lim,, >nc P" exist? If the limit exists, does the limit have equal rows? 

If it is possible to move from any state of a Markov chain to any other state in a 
finite number of steps, the Markov chain is called irreducible. The period of state i of a 
Markov chain is the greatest common divisor (GCD) of all n such that P n (n ) > 0. State 
i is aperiodic if its period is equal to 1. A finite-state Markov chain is called ergodic if 
it is irreducible and all its states are aperiodic. 

It can be shown that in an ergodic Markov chain lim,, >0o P n always exists and 
all rows of the limit are equal, i.e., Equation 2.7-48 holds. In this case a unique sta- 
tionary (steady-state) state probability distribution exists and starting from any initial 
state probability vector, the Markov chain ends up in the steady-state state probability 
vector p. 

example 2.7-2. A Markov chain with four states is described by the finite-state dia- 
gram shown in Figure 2.7-1. For this Markov chain we have 

'k 5 0 §' 

ioio 

Li o \ oj 


P = 


(2.7-51) 
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FIGURE 2.7-1 

State transition diagram for a FSMC. 

It is easily verified that this Markov chain is irreducible and aperiodic, and thus ergodic. 
To find the steady-state probability distribution, we can either find the limit of P" as 
n — »• oo or solve Equation 2.7-50. The result is 

[0.49541 0.19725 0.12844 0.17889] (2.7-52) 


■ 2.8 

SERIES EXPANSION OF RANDOM PROCESSES 

Series expansion of random processes results in expressing the random processes in 
terms of a sequence of random variables as coefficients of orthogonal or orthonormal 
basis functions. This type of expansion reduces working with random processes to work- 
ing with random variables, which in many cases are easier to handle. In the following 
we describe two types of series expansions for random processes. First we describe the 
sampling theorem for band-limited random processes, and then we continue with the 
Karhunen-Loeve expansion of random processes, which is a more general expansion. 


2.8-1 Sampling Theorem for Band-Limited Random Processes 

A deterministic real signal x(t) with Fourier transform X( f ) is called band-limited if 
X(f) = 0 for | /| > W, where W is the highest frequency contained in x(t). Such a 
signal is uniquely represented by samples of x(t ) taken at a rate of f s >2 W samples/s. 
The minimum rate f N = 2W samples/s is called the Nyquist rate. For complex- 
valued signals W is one-half of the frequency support of the signal; i.e., if W\ and Wo 
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are the lowest and the highest frequency components of the signal, respectively, then 
2 IT = W 2 — W\. The signal can be perfectly reconstructed from its sampled values if 
the sampling rate is at least equal to 2 W . The difference, however, is that the sampled 
values are complex in this case, and for specifying each sample, two real numbers are 
required. This means that a real signal can be perfectly described in terms of 2 W real 
numbers per second, or it has 2 W degrees of freedom or real dimensions per second. 
For a complex signal the number of degrees of freedom is 4 IT per second, which is 
equivalent to 2 IT complex dimensions or 4 IT real dimensions per second. 

Sampling below the Nyquist rate results in frequency aliasing. The band-limited 
signal sampled at the Nyquist rate can be reconstructed from its samples by use of the 
interpolation formula 


OO 

x(t) = ^ * 

n =— 00 


/ n \ 

( n Y 

sine 

2W it 

\2W) 

L V 2Wj\ 


(2.8-1) 


where {x(n/2W)} are the samples of x(t ) taken at t = n/2W, n = 0, ±1, ±2, 
Equivalently, x(t) can be reconstructed by passing the sampled signal through an ideal 
lowpass filter with impulse response h(t) = sinc(2WT)- Figure 2.8-1 illustrates the 
signal reconstruction process based on ideal interpolation. Note that the expansion of 
x(t) as given by Equation 2.8-1 is an orthogonal expansion and not an orthonormal 
expansion since 


r-OQ 

( n Y 


/ sine 

2 IT r 

sine 

/— OO 

L V 2WJ\ 



2 IT [t- 


21T 


dt 



n = m 
n m 


( 2 . 8 - 2 ) 


A stationary stochastic process X(t) is said to be band-limited if its power spec- 
tral density Sx(f) = 0 for |/| > IT. Since Sx(f ) is the Fourier transform of the 
autocorrelation function Rx(t), it follows that Rx{t) can be represented as 


Rx( r) 


E R * 


/ n \ 

( n Y 

sine 

2W it 

\2W) 

L v 2ir;j 


(2.8-3) 


where {R x (n/2W)} are samples of Rx(t) taken at r = n/2W, n = 0, ±1, ±2, 
Now, if X(t) is a band-limited stationary stochastic process, then X(t) can be repre- 
sented as 


OO 

x(o = J2 X 

11 = — OO 



sine 


2W 



(2.8-4) 



FIGURE 2.8-1 

Sampling and reconstruction from 
samples. 
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where {X{n/2W)} are samples of X{t) taken at / = n/2W, n = 0, ±1, ±2, This is 

the sampling representation for a stationary stochastic process. The samples are random 
variables that are described statistically by appropriate joint probability density func- 
tions. If X(f) is a WSS process, then random variables [X(n/2W)} represent a WSS 
discrete-time random process. The autocorrelation of the sample random variables is 
given by 


E 


X 








S x {f)e j2nf ^ 


df 


(2.8-5) 


If the process X(f) is filtered white Gaussian noise, then it is zero-mean and its power 
spectrum is flat in the [— W, W] interval. In this case the samples are uncorrelated, and 
since they are Gaussian, they are independent as well. 

The signal representation in Equation 2.8^4 is easily established by showing that 
(Problem 2.44) 


E 


OO 

X(t) - J2 X 

n=—o o 



sine 


2W 



= 0 


( 2 . 8 - 6 ) 


Hence, equality between the sampling representation and the stochastic process X(t) 
holds in the sense that the mean square error is zero. 


2.8-2 The Karhunen-Loeve Expansion 

The sampling theorem presented above gives a straightforward method for orthogonal 
expansion of band-limited processes. In this section we present the Karhunen-Loeve 
expansion, an orthonormal expansion that applies to a large class of random processes 
and results in uncorrelated random variables as expansion coefficients. We present only 
the results of the Karhunen-Loeve expansion. The reader is referred to Van Trees (1968) 
or Loeve (1955) for details. 

There are many ways in which a random process can be expanded in terms of a 
sequence of random variables {X„} and an orthonormal basis {</>„(?)}. However, if we 
require the additional condition that the random variables X„ be mutually uncorrelated, 
then the orthonormal bases have to be the solutions of an eigenfunction problem given 
by an integral equation whose kernel is the autocovariance function of the random 
process. Solving this integral equation results in the orthonormal basis {4> n (t)}, and 
projecting the random process on this basis results in the sequence of uncorrelated 
random variables {X„}. 

The Karhunen-Loeve expansion states that under mild conditions, a random process 
X(0 with autocovariance function 


Cx(t\, t 2 ) = R x (t i, h) ~ m x {t\)m* x (t 2 ) 


(2.8-7) 
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can be expanded over an interval of interest [a , b] in terms of an orthonormal basis 
{0nO)}£Li suc h that the coefficients of expansion are uncorrelated. The 0„(f)'s are 
solutions (eigenfunctions) of the integral equation 


Cx(h,t 2 )<t> n (te)dt 2 = K<Pn(h), a < t\ < b 


( 2 . 8 - 8 ) 


with appropriate normalization such that 

rb 

/ \<f>„{t)\ 2 dt = l 

J a 

The Karhunen-Loeve expansion is given by 

OO 

X(t) = Y x n <W0> a < t < b 

n = 1 


(2.8-9) 


with the following properties: 

1. Random variables X„ denoting the coefficients of the expansion are projections of 
the random process X(t ) on the basis functions, i.e., 

Xn = (X(t), = f X(t) 0*(f) dt (2.8-10) 

J Cl 

2. Random variables X n are mutually uncorrelated. Moreover, the variance of X„ 
is ~k n . 

COY[X n ,X m ]=r n n = m (2.8-11) 

10 n yt m 

3. We have 

E[X(t)] = E[X(t )] = m x (t), a < t < b (2.8-12) 

4. X(t) is equal to X{t) in the mean square sense 

E[|X(0-1(0I 2 ] =0, a < t < b (2.8-13) 


5. The covariance Cx(h , ti ) can be expanded in terms of the bases and the eigenvalues 
as given in Equation 2.8-14. This is result is known as Mercer’s theorem. 

OO 

Cx(tu h) = Y x n^n(h)^n(t 2 ), a < t u t 2 < b (2.8-14) 

«= i 

6. The eigenfunctions {0 n (O}^Li form a complete basis for expansion of all signals 
g(t) which have finite energy in the interval \a, b\. In other words, if g(t) is such 
that 

\g(t)\ 2 dt < oo 
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then we can expand it in terms of {0„( f)} as 


where 


OO 

g( t) = ^gnM), a < t < b (2.8-15) 

n= 1 


gn = {gif), <t> n (t)) = [ b g(t)<t>*n(t ) dt (2.8-16) 


Equation 2.8-13, which states the Karhunen-Loeve expansion, is usually written 
in the form 


OO 

X{t) = X n cp n (t ), a < t < b (2.8-17) 

n = 1 


where it is understood that the equality is in the mean square sense. The {</>„(0} are 
obtained by solving Equation 2.8-8 and normalizing the solutions, and the coefficients 
{X„} are obtained by using Equation 2.8-10. 

It is worthwhile noting that the Karhunen-Loeve expansion applies to both WSS and 
nonstationary processes. In the special case where the process is zero-mean, the autoco- 
variance function Cx(t\ , f 2 ) is substituted with the autocorrelation function Rxih, ti). 
If the process X(t) is a Gaussian process, { X „ } are independent Gaussian random 
variables. 


example 2 . 8 - 1 . Let X(t) be a zero-mean white process with power spectral density 
. To derive the Karhunen-Loeve expansion for this process over an arbitrary interval 
[a,b], we have to solve the integral equation 

No 

—S(ti - f2)0»fe)dr 2 = K4>n(ti), a < t\ < b (2.8-18) 

where A 2 (l <5(f| — f 2 ) is the autocorrelation function of the white process. Using the sifting 
property of the impulse function, we have 

N ^<Pn(t\) = K4>n(t\), a < t\ <b (2.8-19) 

From this equation we see that <p„(t) can be any arbitrary function. Therefore, any 
orthonormal basis can be used for expansion of white processes, and all coefficients of 
the expansion X„ will have the same variance of ^ . 



■ 2.9 

BANDPASS AND LOWPASS RANDOM PROCESSES 

In general, bandpass and lowpass random processes can be defined as WSS processes 
X(t) for which the autocorrelation function Rx(j) is either a bandpass or a lowpass 
signal. Recall that the autocorrelation function is an ordinary deterministic function 
with a Fourier transform which represents the power spectral density of the random 
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process X(t). Therefore, for a bandpass process the power spectral density is located 
around frequencies ±/ 0 , and for lowpass processes the power spectral density is located 
around zero frequency. 

To be more specific, we define a bandpass (or narrowband) process as a real, zero- 
mean, and WSS random process whose autocorrelation function is a bandpass signal. 

Inspired by Equations 2.1-11, we define the in-phase and quadrature components 
of a bandpass random process X(t) as 


We will now show that 

1. Xj(t) and X q (t) are jointly WSS zero-mean random processes. 

2. Xj(t) and X q (t) have the same power spectral density. 

3. Xj(t) and X q (t) are both lowpass processes; i.e., their power spectral density is 
located around / = 0. 

We also define the lowpass equivalent process X/(t) as 


and we will derive an expression for its autocorrelation function and power spectral 
density. In addition we will see that X[(t) is a proper random process. 

Since X(t) by assumption is zero-mean, so is X{t), its Hilbert transform. This is 
obvious since the Hilbert transform is just a filtering operation. From this observation, 
it is clear that Xi(t) and X q (t) are both zero-mean processes. 

To derive the autocorrelation function of Xi(t), we have 

Rx,(t + r, t) = E[X i (t + r)X i (t)] 

= E[(Z(t + r)cos2jr/o(t + r) + X{t + r)sin2jr/o(t + r)) (2.9-3) 
x (X(t)cos2nfot + X(t)sin2itfot)] 

Expanding this relation, we have 

Rx,(t + T, t) = Rx(t) cos 2nfo(t + r)cos2jtfot 


Xj(t) = X(t)cos2jtfot + X(t) sin2nfot 
X q (t) = X(t)cos2nfot — X{t) sin27r/of 


(2.9-1) 


Xi(t) = XM + jX q (t) 


(2.9-2) 


+ R X x(t + t, t)cos27r/o(f + r)sin2iTfot 
+ Rxx(* + T ’ 0 sin 2jr/ 0 (t + r)cos2nfot 
+ R x x(t + r, t) sin2nfo(t + r)sin27r/of 


(2.9-4) 


Since the Hilbert transform is the result of passing the process through an LTI sys- 
tem, we conclude that X{t) and X(t) are jointly WSS and therefore all the auto- and 
cross-correlations in Equation 2.9-4 are functions of r only. Using Equations 2.7-17 
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and 2.7-18, we can easily show that (see Problem 2.56) 


R xx^) — Rxij) 

R kx (r) = R x ( r) (2-9-5) 

Rxx ( r ) = R x(.r) 

Substituting these results into Equation 2.9-4 and using standard trigonometric 
identities yield 


R Xi (r) = R x (r)cos(2nfor) + Rx(r)sin(2nfor) (2.9-6) 

Similarly, we can show that 

R x q (r) = Rxft) = Rx(r)cos(2nf 0 r) + Rx(r) sin(27r/ 0 r) (2.9-7) 

Rx,x q (r) = -Rx q x,(r) = R x (r) sin(27r/ 0 r) - R x (r)cos(2jtf 0 r) (2.9-8) 

These relations show that X,-(t) and X q (t) are zero-mean jointly WSS processes with 
equal autocorrelation functions (and thus equal power spectral densities). 

To derive the common power spectral density of Xj(t) and X q (t) and their cross 
spectral density, we derive the Fourier transforms of Equations 2.9-7 and 2.9-8. We 
need to use the modulation property of the Fourier transform and the fact that the Fourier 
transform of Rx(t) is equal to —jsgn(f)S x (f). Given these facts, it is straightforward 
to derive 


Sxi(f) = $x q (f) = 
Sx,x q (f) = ~Sx q x,(f) = 


I <Sx(f + fo) + Sxif ~ fo) 

1 ° 

f j[S x (f + fo) ~ S x (f - fo)] 

1 ° 


I/I < /° (2.9-9) 

otherwise 

I/I < /) 

otherwise 

(2.9-10) 


Equation 2.9-9 states that the common power spectral density of the in-phase and 
quadrature components of X(t) is obtained by shifting the power spectral density of X(t ) 
to left and right by fo and adding the results and then removing all components outside 
[—fo, /)]• This result also shows that both Xj(t) and X q (t) are lowpass processes. 
From Equation 2.9-10 we see that if Sx(f + fo) = Sx(f ~ fo) f° r I /I < fo, then 
SxiX q (f) = 0 and consequently, Rx^fj) = 0. Since Xj(t) and X q (t) are zero-mean 
processes, from Rx,x q { r) = 0 we conclude that under this condition X,(t) and X q (t) 
are uncorrelated. One of the cases where we have S x {f + fo) = Sxif — fo) for |/| <0 
occurs when Sx(f) is symmetric around fo, in which case the in-phase and quadrature 
components will be uncorrelated processes. 

We define the complex process Xi(t) = X,(t) + j X q (t) as the lowpass equivalent 
of X(t). Since X,(f) and X q (t) are both lowpass processes, we conclude that X/(t ) is 
also a lowpass process. Comparing Equations 2.9-7 and 2.9-8 with Equations 2.7-39 
and 2.7-40, we can conclude that Xi(t) is a proper random process, and therefore, from 
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Equation 2.7-41, we have 


R x fr) = 2R x fr) + 2jR XqX fr) 

= 2[R x {t) + fR x {x)\e~ j27lhl 


(2.9-11) 

(2.9-12) 


where we have used Equations 2.9-7 and 2.9-8. Comparing Equations 2.9-12 and 
2.1-6, we observe that R x f r) is twice the lowpass equivalent of R x (t). In other words, 
the autocorrelation function of the lowpass equivalent process Xft) is twice the lowpass 
equivalent of the autocorrelation function of the bandpass process Xft). 

Taking Fourier transform of both sides of Equation 2.9-12, we obtain 


Sxff) = 


4 S x {f + /o) 
0 


I/I < fo 

otherwise 


and consequently, 


S x (f) = ^\Sxff ~ fo) + S x f—f ~ /o)] 


(2.9-13) 


(2.9-14) 


We also observe that if X (t) is a Gaussian process, then Xft), X q (t), and Xft) will 
be jointly Gaussian processes; and since Xft ) is Gaussian, zero-mean, and proper, we 
conclude that Xft) is a circular process as well. In this case if S x {f + fo) = S X (f~fo) 
for |/| < fo, then Xft) and X c ft) will be independent processes. 

example 2.9-1. White Gaussian noise with power spectral density of ^ passes 
through an ideal bandpass filter with transfer function 


H(f) = 


l/-/ol < W 
otherwise 


where W < fo. The output, called filtered white noise, is denoted by X(t). This process 
has a power spectral density of 


Sx(f) = 



I/-/ 0 I < W 
otherwise 


Since S x (f + fo) = S x (f — fo) for |/| < fo, and the process is Gaussian, Xff ) and 
X q (f) are independent lowpass processes. Using Equation 2.9-9, we conclude that 


Sxff) = S x ff) = 



I/I < W 
otherwise 


and from Equation 2.9-13, we conclude that 


Sxff) = 



I/I < W 
otherwise 
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In this chapter we have provided a review of basic concepts and definitions in signal 
analysis, the theory of probability, and stochastic processes. An advanced book on signal 
analysis that covers most of the material presented here in detail is the book by Franks 
(1969). The texts by Davenport and Root (1958), Davenport (1970), Papoulis and Pillai 
(2002), Peebles (1987), Helstrom (1991), Stark and Woods (2002), and Leon-Garcia 
(1994) provide engineering-oriented treatments of probability and stochastic processes. 
A more mathematical treatment of probability theory may be found in the text by Loeve 
(1955). Finally, we cite the book by Miller (1964), which treats multidimensional 
Gaussian distributions. 


PROBLEMS 


2.1 Prove the following properties of Hilbert transforms: 

a. If x(t ) = x(—t), then x(t) = —x(—t). 

b. If x(t) = —x(—t), then x(t) = x(—t). 

c. If x(t) = cos wori then x{t) = sin wot. 

d. If x(t) = sin coot, then x(t) = — coswot. 

e. x(t) = —x(t) 


f. / x"(t)dt = / x 2 (t)dt 

J — oo J — o o 

poo 

g. / x(t)x(t)dt=0 


2.2 Let x(t) and y(t) denote two bandpass signals, and let x\(t) and y/(t) denote their lowpass 
equivalents with respect to some frequency /o. We know that in general x/(t) and y/(f) are 
complex signals. 

1. Show that 


x(t)y(t) dt = - Re 
, 2 


xi(t)y*(t)dt 


2. From this conclude that £ x = \£ x ,, i.e., the energy in a bandpass signal is one-half the 
energy in its lowpass equivalent. 


2.3 Suppose that s(t ) is either a real- or complex-valued signal that is represented as a linear 
combination of orthonormal functions {/„(!)}, he-, 


K 

s(t) = 5>a« 

k = 1 

where 


fn{t)f*(t)dt = 


m = n 
m ^ n 


— oo 
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Determine the expressions for the coefficients {s,t} in the expansion S, (t) that minimize the 
energy 

/ OO 

\s(t) - s(t)\ 2 dt 

•OO 

and the corresponding residual error E e . 

2.4 Suppose that a set of M signal waveforms { si„, (f)} is complex-valued. Derive the equations 
for the Gram-Schmidt procedure that will result in a set of N < M orthonormal signal 
waveforms. 


2.5 Carry out the Gram-Schmidt orthogonalization of the signals in Figure 2.2-1 (a) in the order 
suit), and thus obtain a set of orthonormal functions {/,„(?)}. Then determine 

the vector representation of the signals { (?)} by using the orthonormal functions {/,„(?)}• 
Also determine the signal energies. 


2.6 Assuming that the set of signals {</>„/(t), n = 1, . . . , N] is an orthonormal basis for rep- 
resentation of {s m /(t), m = 1 M}, show that the set of functions given by Equa- 

tion 2.2-54 constitutes a 2 N orthonormal basis that is sufficient for representation of M 
bandpass signals given in Equation 2.2-55. 

2.7 Show that 

m = -m 

where cj>{t) denotes the Hilbert transform and f and <f> are given by Equation 2.2-54. 

2.8 Determine the correlation coefficients pi m among the four signal waveforms {j, (t)} shown 
in Figure 2.2-1 and their corresponding Euclidean distances. 

2.9 Prove that s/(t) is generally a complex- valued signal, and give the condition under which 
it is real. Assume that s(t) is a real-valued bandpass signal. 


2.10 Consider the three waveforms /„(?) shown in Figure P2.10. 

/i(0 m 


m 


1 





2 






1 

0 1 

2 

3 

4 


2 






FIGURE P2.10 
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a. Show that these waveforms are orthonormal. 

b. Express the waveform x(t) as a linear combination of /„(f), n = 1, 2, 3, if 


x(t) = 


-1 

0 < t < 1 

1 

1 < t < 3 

-1 

3 < t < 4 


and determine the weighting coefficients. 


2.11 Consider the four waveforms shown in Figure P2.1 1. 

a. Determine the dimensionality of the waveforms and a set of basis functions. 

b. Use the basis functions to represent the four waveforms by vectors Si, * 2 , S 3 , and * 4 . 

c. Determine the minimum distance between any pair of vectors. 


*i (0 

2 

1 

0 

-1 


s 2 (t) 


3 4 t 


*3 (0 



H<f) 

2 


1 


0 


-2 


- 




1 

2 3 

4 





FIGURE P2.ll 


2.12 Determine a set of orthonormal functions for the four signals shown in Figure P2.12. 




FIGURE P2.12 


2.13 A random experiment consists of drawing a ball from an urn that contains 4 red balls 
numbered 1, 2, 3, 4 and three black balls numbered 1, 2, 3. The following events are 
defined. 

1. E\ = The number on the ball is even. 

2. En = The color of the ball is red, and its number is greater than 1. 

3. E 3 = The number on the ball is less than 3. 

4. E 4 — E\ U Ej 

5. £5 = E\ U (£2 C £ 3 ) 
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Answer the following questions. 

1. Whatis P(£ 2 )? 

2. What is P{E 2 \E 2 )7 

3. What is P(E 2 \E 4 E 3 )1 

4. Are E 3 and E 5 independent? 

2.14 In a certain city three car brands A, B, C have 20%, 30% and 50% of the market share, 
respectively. The probability that a car needs major repair during its first year of purchase 
for the three brands is 5%, 10%, and 15%, respectively. 

1. What is the probability that a car in this city needs major repair during its first year of 
purchase? 

2. If a car in this city needs major repair during its first year of purchase, what is the 
probability that it is made by manufacturer A? 

2.15 The random variables X t , i = 1,2, . . . , n, have joint PDF p(x\ , x 2 , . . . , x n ). Prove that 
p{x\,x 2 , x 3 ,...,x n ) = p(x n |x„_i, . . . , Xi)p(x n -\\x n - 2 , ...,x{)--- p(x-i\x 2 , xi)p(x 2 \x\)p(x\) 

2.16 A communication channel with binary input and ternary output alphabets is shown in 
Figure P2.16. The probability of the input being 0 is 0.4. The transition probabilities are 
shown on the figure. 



1. If the channel output is A, what is the best decision on channel input that minimizes 
the error probability? Repeat for the cases where channel output is B and C. 

2. If a 0 is transmitted and an optimal decision scheme (the one derived in part 1) is used 
at the receiver, what is the probability of error? 

3. What is the overall error probability for this channel if the optimal decision scheme is 
used at the receiver. 


2.17 The PDF of a random variable X is p{x). A random variable Y is defined as 

Y = aX + b 

where a < 0. Determine the PDF of Y in terms of the PDF of X. 

2.18 Suppose that A is a Gaussian random variable with zero mean and unit variance. Let 

Y = aX 1 2 3 + b, a > 0 


Determine and plot the PDF of Y. 
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2.19 The noise voltage in an electric circuit can be modeled as a Gaussian random variable with 
mean equal to zero and variance equal to 10 -8 . 

1. What is the probability that the value of the noise exceeds 10 -4 ? What is the probability 
that it exceeds 4 x 10 4 ? What is the probability that the noise value is between —2 x 
1CT 4 and 1(T 4 ? 

2. Given that the value of the noise is positive, what is the probability that it exceeds 10 -4 ? 


2.20 X is aA/XO, a 2 ) random variable. This random variable is passed through a system whose 
input-output relation is given by y = g(x). Find the PDF or the PMF of the output random 
variable F in each of the following cases. 

1. Square-law device, g(x ) = ax 2 . 

2. Limiter, 


3. Hard limiter. 



g(x) = 


-b x < —b 

b x > b 

x \x\ < b 

{ a x > 0 

0 x = 0 

b x < 0 


4. Quantizer, g(x) = x„ for a„ < x < a n+ 1 , 1 < n < N, where x n lies in the interval 
\a n ,a n+ \] and the sequence { a\,a 2 , . ,., 0 /v+i} satisfies the conditions a\ = — oo, 
On + i = 00 and f°r i > j we have a,- > aj. 


2.21 Shows that for an J\[(tn , a 2 ) random variable we have 
E [(X — m) n ] _/^ x 3 x 5x---x (2k — \)a 


2k _ (2k)\a z 
2 k k\ 


for n = 2k 
for n = 2k + 1 


2.22 a. Let X r and X, be statistically independent zero-mean Gaussian random variables with 
identical variance. Show that a (rotational) transformation of the form 

Y r + jYi = (X r + jX t )e- » 


results in another pair ( Y r , Yf) of Gaussian random variables that have the same joint 
PDF as the pair ( X r , X, ). 
b. Note that 


Y r 

Yi 


= A 


X r 

Xi 


where A is a 2 x 2 matrix. As a generalization of the two-dimensional transformation 
of the Gaussian random variables considered in (a), what property must the linear 
transformation A satisfy if the PDFs for X and F, where F = AX. X = (X\ X 2 ■ ■ ■ X n ), 
and F = (F 1 F 2 ■ ■ • F„) are identical? 


2.23 Show that if A is a Gaussian vector, the random vector F = AX, where the invertible 
matrix A represents a linear transformation, is also a Gaussian vector whose mean and 
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covariance matrix are given by 


m Y = Am x 
C Y = AC X A ' 


2.24 The random variable Y is defined as 


r = £*' 

;=i 

where the Xj, i = 1 , 2 , . . . , n, are statistically independent random variables with 

{ 1 with probability p 

0 with probability 1 — p 

a. Determine the characteristic function of Y. 

b. From the characteristic function, determine the moments E(Y) and E(Y 2 ). 


2.25 This problem provides some useful bounds on Q{x). 

_£±j£ , 

1. By integrating e 2 on the region u > x and v > x in R' , where x > 0, then 

changing to polar coordinates and upper bounding the integration region by the region 
r > ~Jlx in the first quadrant, show that Q(x) < for all x > 0. 

2. Apply integration by parts to 



dy 


and show that 


\/2 jt(1 + x 2 ) 


e 2 < Q(x) < 


y/27TX 


for all x > 0 . 

3. Based on the result of part 2 show that, for large x, 


Q(x) 



x 2 

T 


2.26 Let Xi, X 2 , X 3 , . . . denote iid random variables each uniformly distributed on [0, A], 
where A > 0. Let Y„ = min{Xi, X 2 , . . . , X,,}. 

1. What is the PDF of T„? 

2. Show that if both A and n go to infinity such that j = A, where k > 0 is a constant, 
the density function of Y„ tends to an exponential density function. Specify this density 
function. 


2.27 The four random variables X 1 , X 2 , X 3 , X 4 are zero-mean jointly Gaussian random 
variables with covariance C,-; = E(XjXj) and characteristic function u> 2 , u> 3 , aq). 

Show that 


E(X { XiX 3X4) = C12C34 + Ci 3 C 24 + C14C23 
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2.28 Let 


®x(0 = E |V Z ] 


denote the moment generating function of random variable X. 

1. Using the Chernov bound, show that 

In P [X > a] < — max(at — In 0^(0) 

r> 0 


2. Define 


1(a) = max(at — In 0x(t)) 

t> o 

as the large-deviation rate function of the random variable X, and let X\, X 2 , . . . , X n 
be iid. Define S n = (Xi + X 2 + ■ ■ ■ + X n )/n. Show that for a > E [X] 

- InP [S„ > a] < —1(a) 
n 


or equivalently 

P [ S„ > a] < e- nl(a) 

Note: It can be shown that for a > E [X], we have P[S„ > a] = e -' !/ (“)+°(«) 5 where 
o (n) — »• 0 as n — >■ 00 . This result is known as the large-deviation theorem. 

3. Now assume the X, ’s are exponential, i.e., 

/ (e~ x x>0 
PX X) _ \0 otherwise 

Using the large-deviation result, show that 

P[S„ > a] = a"e-" (a - 1)+0(n> 


for a > 1 . 

2.29 From the characteristic functions for the central chi-square and noncentral chi-square 
random variables given in Table 2.3-3, determine their corresponding first and second 
moments. 

2.30 The PDF of a Cauchy distributed random variable X is 

a / n 

p(x) = — -, —00 < x < 00 

x~ + a 2 

a. Determine the mean and variance of X. 

b. Determine the characteristic function of X. 

2.31 Let Rq denote a Rayleigh random variable with PDF 


/fi 0 ( r 0) = 


0 


ro > 0 
otherwise 
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and R\ be Ricean with PDF 


/*, (n) 



ri > 0 

otherwise 


Furthermore, assume that Rq and R\ are independent. Show that 

1 _e 2 

P(Ro > Ri ) = -e ** 

2.32 Suppose that we have a complex- valued Gaussian random variable Z = X + jY , where 
( X , Y) are statistically independent variables with zero mean and variance E [A 2 ] = 
E [y 2 ] = a 2 . Let R = Z + m, where m = m r + /m,- and define R as R = A + j B. 
Clearly, A = X + m r and B = Y + nij. Determine the following probability density 
functions: 

1 - PA.Bia.b) 

2. Pu. 4 >( m ' where U = V A 2 + B 2 and <J> = tan -1 B / A 

3. pu(u) 

Note: In part 2 it is convenient to define 6 = tan _1 (m,- /m r ) so that 


Furthermore, you must use Equation 2.3-34, defining Iq(-) as the modified Bessel function 
of order zero. 

2.33 The random variable Y is defined as 


where A,-, / = 1,2, .... n, are statistically independent and identically distributed random 
variables each of which has the Cauchy PDF given in Problem 2.30. 

a. Determine the characteristic function of Y. 

b. Determine the PDF of Y. 

c. Consider the PDF of Y in the limit as n — > oo. Does the central limit theorem hold? 
Explain your answer. 

2.34 Show that if Z is circular, then it is zero-mean and proper, i.e., E [Z] = 0 and E [ZZ'] = 0. 

2.35 Show that if Z is a zero-mean proper Gaussian complex vector, then Z is circular. 

2.36 Show that if Z is a proper complex vector, then any transform of the form W = AZ + b 
is also a proper complex vector. 

2.37 Assume that random processes A (f) and Y(t) are individually and jointly stationary. 

a. Determine the autocorrelation function of Z{t) = X(t) + Y(t). 

b. Determine the autocorrelation function of Z(r) when A (t) and Y(t ) are uncorrelated. 

c. Determine the autocorrelation function of Z(t) when A (f) and Y{t) are uncorrelated 
and have zero means. 




(=i 
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2.38 The autocorrelation function of a stochastic process X(t) is 

Rx(r) = ^N 0 S(z) 

Such a process is called white noise. Suppose x(t) is the input to an ideal bandpass filter 
having the frequency response characteristic shown in Figure P2.38. Determine the total 
noise power at the output of the filter. 


FIGURE P2.38 




B 1 , 







2.39 A lowpass Gaussian stochastic process X(t) has a power spectral density 

Determine the power spectral density and the autocorrelation function of Y ( t ) = X 2 (t). 


2.40 The covariance matrix of three random variables X \ , Xi, and A 3 is 


c n 

0 

Cl3 _ 

0 

C22 

0 

.C31 

0 

C33. 


The linear transformation y = AX is made where 


A = 


'1 

0 

1 


0 O' 
2 0 
0 1 


Determine the covariance matrix of Y. 


2.41 Let X(t) be a stationary real normal process with zero mean. Let a new process Y (t) be 
defined by 


Y(t) = X 2 (t) 


Determine the autocorrelation function of Y (t) in terms of the autocorrelation function of 
X{t). Hint : Use the result on Gaussian variables derived in Problem 2.27. 


2.42 For the Nakagami PDF, given by Equation 2.3-67, define the normalized random variable 
X = R/^fO.. Determine the PDF of X. 

2.43 The input X(t ) in the circuit shown in Figure P2.43 is a stochastic process with E [ X(t )] = 0 
and R x ( r) = a 2 S( r); i.e., X(t) is a white noise process. 

a. Determine the spectral density 5y(/). 

b. Determine Ry( r) and E[Y 2 (t)\. 
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Wv 

R 

X(f) 


FIGURE P2.43 


Y(t) 


2.44 Demonstrate the validity of Equation 2.8-6. 

2.45 Use the Chernoff bound to show that Q(x) < e - */ 2 . 

2.46 Determine the mean, the autocorrelation sequence, and the power density spectrum of the 
output of a system with unit sample response 

1 n = 0 


„ 0 otherwise 

when the input x(n) is a white noise process with variance a 2 . 

2.47 The autocorrelation sequence of a discrete-time stochastic process is R{k) = 
Determine its power density spectrum. 

2.48 A discrete-time stochastic process X{n) = X(nT) is obtained by periodic sampling of a 
continuous-time zero-mean stationary process X{t), where T is the sampling interval; i.e., 
f s = 1 / T is the sampling rate. 

a. Determine the relationship between the autocorrelation function of X(t) and the auto- 
correlation sequence of X(n). 

b. Express the power density spectrum of X(n) in terms of the power density spectrum of 
the process X{t). 

c. Determine the conditions under which the power density spectrum of X{n) is equal to 
the power density spectrum of X(t). 

2.49 The random process V{t) is defined as 

V(t) = X cos27r f c t — Y smljrfct 

where X and Y are random variables. Show that V(t) is wide-sense stationary if and only 
if E(X) = E(Y ) = 0, E(X 2 ) = E(Y 2 ), and E{XY) = 0. 

2.50 Consider a band-limited zero-mean stationary stochastic process X(t) with power density 
spectrum 


fl |/| < W 

5 * (/) ={o otherwise 

X(r)is sampled at a rate f s = 1 /T to yield a discrete-time process X(n) = X(nT). 

a. Determine the expression for the autocorrelation sequence of X(n). 

b. Determine the minimum value of T that results in a white (spectrally flat) sequence. 
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c. Repeat (b) if the power density spectrum of X{1) is 


Sxif) 


fl-l/l/W 

1 ° 


I/I < W 
otherwise 


2.51 Show that the functions 


fkit) = sine 


21V 



k = 0, ±1,±2, ... 


are orthogonal over the real line, i.e., 


fk(t)fj(t)dt = 


1 1/2W 

to 


k = j 

otherwise 


Therefore, the sampling theorem reconstruction formula may be viewed as a series expan- 
sion of the band-limited signal s(t), where the weights are samples of s(t) and the 
are the set of orthogonal functions used in the series expansion. 


2.52 The noise equivalent bandwidth of a system is defined as 

i r°° 

Seq =-y \H{f)\ 2 df 

where G = max Using this definition, determine the noise equivalent bandwidth 

of the ideal bandpass filter shown in Figure P2.38 and the low-pass system shown in 
Figure P2.43. 


2.53 Suppose that N(t) is a zero-mean stationary narrowband process. The autocorrelation 
function of the equivalent lowpass process Z(t) = X(t) + jY(t) is defined as 

Rz( r) = E [Z*(t)Z(t + r)] 

a. Show that 


E [Z(t)Z(t + r)] = 0 
b. Suppose R z (t) = Nq8(t), and let 

V = [ Z(t)dt 

Jo 

Determine E [ V 2 ] and E [| P | 2 ] . 

2.54 Determine the autocorrelation function of the stochastic process 

X(t) = A sin(27r/ c r + 0) 

where f c is a constant and © is a uniformly distributed phase, i.e., 


p{6) = 


2jt ’ 


0 < 6 < 2jt 
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2.55 Let Z(r) = X(t) + j Y(t) be a complex random process, where X(t) and Y(t) are real- 
valued, independent, zero-mean, and jointly stationary Gaussian random processes. We 
assume that X(t ) and Y(t) are both band-limited processes with a bandwidth of W and a 
flat spectral density within their bandwidth, i.e., 


Sx(f) 


Sy(f) = 


(No 

\o 


I/I < w 

otherwise 


1. Find E[Z(t)] and R z (t + r, f), and show that Z(t) is WSS. 

2. Find the power spectral density of Z(t). 

3. Assume <p\(t), <p 2 (r), . . . , </>„(?) are orthonormal, i.e.. 


4>j(t)<p* k (t)dt 


f 1 j = k 

[0 otherwise 


and all 4>j(t)’s are band-limited to [-W, W], Define random variables Z ; - as the pro- 
jections of Z(f) on the 4>j(t)’ s, i.e., 

/ OO 

Z{t)(j>*{t) dt , j = l,2,...,n 

•OO 

Determine E[Zj ] and E[ZjZ^\ and conclude that the Zf s are iid zero-mean Gaussian 
random variables. Find their common variance. 

4. Let Z j = Z j r + j Zji, where Z y > and Z Jt denote the real and imaginary parts, respec- 
tively, of Zj. Comment on the joint probability distribution of the 2 n random variables 

(Zl r , Zli, Z~lr , Z2i, Z„ r , Zm) 


5. Let us define 


Z(f) = Z(0 - ^ Zrfjit) 
j = i 

to be the error in expansion of Z(t) as a linear combination of 0 ; (f)’s. Show that 
E [Z(r)Z|] = 0 for all k = 1, 2, . . . , n. In other words, show that the error Z(f) and all 
the Z k s are uncorrelated. Can you say Z(f) and the Zk s are independent? 

2.56 Let X(t) denote a (real, zero-mean, WSS) bandpass process with autocorrelation function 
and power spectral density Sx(f), where 5x(0) = 0, and let X(t) denote the 
Hilbert transform of X(t). Then X(t) can be viewed as the output of a filter, with impulse 
response j- and transfer function —jsgn(f), whose input is X(t). Recall that when X(t) 
passes through a system with transfer function //(/) and the output is T(f), we have 
Sr(f) = S x (f)\H(f)\ 2 and S XY (f) = S x (f)H*(f). 

1. Prove that R^-(r) = Rx(?)- 

2 . Prove that R X x(t) = —R x (t) 

3. If Z(t) = X(t) + jX(t), determine Sz(f). 

4. Define X/(t) = Z(t)e^N 7r fo‘ . Show that X/(t) is a lowpass WSS random process, and 
determine Sx,(f). From the expression for Sx,(f ), derive an expression for R Xi (t). 
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2.57 A noise process has a power spectral density given by 


Sn(f) = 


fio- 8 (i-^) 


I/I < 10 s 
I/I > 10 s 


This noise is passed through an ideal bandpass filter with a bandwidth of 2 MHz centered 
at 50 MHz. 

1. Find the power content of the output process. 

2. Write the output process in terms of the in-phase and quadrature components, and find 
the power in each component. Assume /o = 50 MHz. 

3. Find the power spectral density of the in-phase and quadrature components. 

4. Now assume that the filter is not an ideal filter and is described by 



49 MHz < |/| < 51 MHz 


otherwise 


Repeat parts 1, 2, and 3 with this assumption. 



Digital Modulation Schemes 


Th e digital data are usually in the form of a stream of binary data, i.e., a sequence 
of Os and Is. Regardless of whether these data are inherently digital (for instance, the 
output of a computer terminal generating ASCII code) or the result of analog-to-digital 
conversion of an analog source (for instance, digital audio and video), the goal is to re- 
liably transmit these data to the destination by using the given communication channel. 
Depending on the nature of the communication channel, data can suffer from one or 
more of certain channel impairments including noise, attenuation, distortion, fading, and 
interference. To transmit the binary stream over the communication channel, we need to 
generate a signal that represents the binary data stream and matches the characteristics 
of the channel. This signal should represent the binary data, meaning that we should be 
able to retrieve the binary stream from the signal; and it should match the characteristics 
of the channel, meaning that its bandwidth should match the bandwidth of the channel, 
and it should be able to resist the impairments caused by the channel. Since different 
channels cause different types of impairments, signals designed for these channels can 
be drastically different. The process of mapping a digital sequence to signals for trans- 
mission over a communication channel is called digital modulation or digital signaling. 
In the process of modulation, usually the transmitted signals are bandpass signals suit- 
able for transmission in the bandwidth provided by the communication channel. In this 
chapter we study the most commonly used modulation schemes and their properties. 


■ 3.1 

REPRESENTATION OF DIGITALLY MODULATED SIGNALS 

The mapping between the digital sequence (which we assume to be a binary sequence) 
and the signal sequence to be transmitted over the channel can be either memoryless or 
with memory, resulting in memoryless modulation schemes and modulation schemes 
with memory. In a memoryless modulation scheme, the binary sequence is parsed into 
subsequences each of length k, and each sequence is mapped into one of the s m (t). 
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10... 1 00...1 01 ... 0 ... 









S m (t) 


FIGURE 3.1-1 

Block diagram of a memoryless digital modulation scheme. 

1 < m < 2 k , signals regardless of the previously transmitted signals. This modulation 
scheme is equivalent to a mapping from M = 2 k messages to M possible signals, as 
shown in Figure 3.1-1. 

In a modulation scheme with memory , the mapping is from the set of the current 
k bits and the past ( L — l)k bits to the set of possible M = 2 k messages. In this case 
the transmitted signal depends on the current k bits as well as the most recent L — 1 
blocks of k bits. This defines a finite-state machine with 2 (L ~ 1)<r states. The mapping that 
defines the modulation scheme can be viewed as a mapping from the current state and 
the current input of the modulator to the set of output signals resulting in a new state of 
the modulator. If at time instant l — 1 the modulator is in state e {1,2,..., 2 <L ~ 1 ,k } 
and the input sequence is I t e {1, 2 , . . . , 2 k ), then the modulator transmits the output 
and moves to new state Si according to mappings 


m t = It) (3.1-D 

Si = fs(Se-uh) (3.1-2) 

Parameters k and L and functions /„,(■, •) and f s ( ■ , •) completely describe the modula- 
tion scheme with memory. Parameter L is called the constraint length of modulation. 
The case of L = 1 corresponds to a memoryless modulation scheme. 

Note the similarity between Equations 3.1-1 and 3.1-2 on one hand and Equa- 
tions 2.7^43 and 2.7-44 on the other hand. Equation 3.1-2 represents the internal 
dynamics of a Markov chain where the future state depends on the current state and 
the input I t (which is a random variable), and Equation 3.1-1 states that the output 
nil depends on the state through random variable If . Therefore, we can conclude that 
modulation systems with memory are effectively represented by Markov chains. 

In addition to classifying the modulation as either memoryless or having memory, 
we may classify it as either linear or nonlinear. Linearity of a modulation method re- 
quires that the principle of superposition apply in the mapping of the digital sequence 
into successive waveforms. In nonlinear modulation, the superposition principle does 
not apply to signals transmitted in successive time intervals. We shall begin by describ- 
ing memoryless modulation methods. 

As indicated above, the modulator in a digital communication system maps a 
sequence of k binary symbols — which in case of equiprobable symbols carries k bits of 
information — into a set of corresponding signal waveforms s m (t), 1 < m < M, where 
M = 2 k . We assume that these signals are transmitted at every T s seconds, where T s is 
called the signaling interval. This means that in each second 



R 


(3.1-3) 
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symbols are transmitted. Parameter R s is called the signaling rate or symbol rate. Since 
each signal carries k bits of information, the bit interval Ty, i.e., the interval in which 
1 bit of information is transmitted, is given by 


and the bit rate R is given by 



T 

log 2 M 


R = kR s = R s log 2 M 


(3.1-4) 

(3.1-5) 


If the energy content of s m (t) is denoted by £ m . then the average signal energy is 
given by 


£ - 


M 

E 

m— 1 


Pm 


(3.1-6) 


where p m indicates the probability of the mth signal (message probability). In the case 
of equiprobable messages, p m = 1/M, and therefore, 


£ 


i M 

av s = m E £ > 

m= 1 


(3.1-7) 


Obviously, if all signals have the same energy, then £ m = £ and £ avg = £. The average 
energy for transmission of 1 bit of information, or average energy per bit, when the 
signals are equiprobable is given by 


^bavg — 


£ 


avg 

Y 


If all signals have equal energy of £, then 

6 -?- 


£ 

°avg 

log 2 M 


£ 


log 2 M 


(3.1-8) 

(3.1-9) 


If a communication system is transmitting an average energy of £|, avg per bit, and 
it takes 7/ seconds to transmit this average energy, then the average power sent by the 
transmitter is 

7>avg = ^ = tf£ b avg (3.1-10) 

lb 

which for the case of equal energy signals becomes 


p = R£ b 


(3.1-11) 


■ 3.2 

MEMORYLESS MODULATION METHODS 

The waveforms s m {t) used to transmit information over the communication channel can 
be, in general, of any form. However, usually these waveforms are bandpass signals 
which may differ in amplitude or phase or frequency, or some combination of two 
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or more signal parameters. We consider each of these signal types separately, begin- 
ning with digital pulse amplitude modulation (PAM). In all cases, we assume that the 
sequence of binary digits at the input to the modulator occurs at a rate of R bits/s. 


3.2-1 Pulse Amplitude Modulation (PAM) 


In digital PAM, the signal waveforms may be represented as 

s m (t) = A m p(t), 1 < m < M (3.2-1) 

where p(t) is a pulse of duration T and { A m , 1 <m< M) denotes the set of M possible 
amplitudes corresponding to M = 2 k possible k - bit blocks of symbols. Usually, the 
signal amplitudes A m take the discrete values 

A m = 2m — l — M, m = l,2,...,M (3.2-2) 


i.e., the amplitudes are ±1, ±3, ±5, . . . , ±(M— 1). The waveform pit) is a real- valued 
signal pulse whose shape influences the spectrum of the transmitted signal, as we shall 
observe later. 

The energy in signal s m (t) is given by 

/ OO 

A 2 m p\t)dt (3.2-3) 

-OO 

= A 2 m £ p (3.2-4) 

where £ p is the energy in pit). From this. 


M 


f — 

Wvg 


J2 A » 


M m=1 

^(l 2 + 3 2 + 5 2 + 
M v 

2£ p M(M 2 - 1) 

— - x 

M 6 

(M 2 - 1 )£ p 


+ (M — if) 


(3.2-5) 


and 


£bavg — 


(M 2 - l)£ p 


(3.2-6) 


3 log 2 M 

What we described above is the baseband PAM in which no carrier modulation is 
present. In many cases the PAM signals are carrier-modulated bandpass signals with 
lowpass equivalents of the form A m g(t), where A m and git ) are real. In this case 

s m {t) = Re [s m ,(t)e j2 ^ t ] (3.2-7) 

= Re [A m git)e i2nfc ‘] = A m g(t) cos(2n f c t) (3.2-8) 
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where f c is the carrier frequency. Comparing Equations 3.2-1 and 3.2-8, we note that 
if in the generic form of PAM signaling we substitute 

Pit) = git) cos(2n f c t) (3.2-9) 


then we obtain the bandpass PAM. Using Equation 2. 1-21, for bandpass PAM we have 

(3.2-10) 

and from Equations 3.2-5 and 3.2-6 we conclude 


A 1 

C m p 

~ 2 ° 8 


and 


£ — 

Wvg — 


C _ 

^bavg — 


(M 2 - 1 )£ g 


(M 2 - \ )£ g 


(3.2-11) 


(3.2-12) 


6 log 2 M 

Clearly, PAM signals are one-dimensional (A = 1) since all are multiples of the 
same basic signals. Using the result of Example 2.2-6, we get 

Pit) 


4>(t) = 




(3.2-13) 


as the basis for the general PAM signal of the form s m (t) = A m p{t) and 


<t>it) = 



g{t) cos 2itf c t 


(3.2-14) 


as the basis for the bandpass PAM signal given in Equation 3.2-8. Using these basis 
signals, we have 


Smit) = A 



for baseband PAM 


Smit) = A 



for bandpass PAM 


(3.2-15) 


From above the one-dimensional vector representations for these signals are of the 
form 


s 


m 



S 


m 



A m = ±1,±3,...,±(M- 1) 
A m = ±1,±3, ...,±(M - 1) 


(3.2-16) 

(3.2-17) 


The corresponding signal space diagrams for M = 2, M = 4, and M = 8 are shown 
in Figure 3.2-1. 

The bandpass digital PAM is also called amplitude -shift keying (ASK). The map- 
ping or assignment of k information bits to the M = 2 k possible signal amplitudes may 
be done in a number of ways. The preferred assignment is one in which the adjacent 
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FIGURE 3.2-1 
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Constellation for PAM signaling. 


(a)M=2 


00 


01 


11 


10 


(b)M=4 


000 001 011 010 110 111 101 100 


(c)M=8 


signal amplitudes differ by one binary digit as illustrated in Figure 3.2-1 . This mapping 
is called Gray coditig. It is important in the demodulation of the signal because the most 
likely errors caused by noise involve the erroneous selection of an adjacent amplitude 
to the transmitted signal amplitude. In such a case, only a single bit error occurs in the 
£-bit sequence. 

We note that the Euclidean distance between any pair of signal points is 


where the last relation corresponds to a bandpass PAM. For adjacent signal points 
| A m — A n | = 2, and hence the minimum distance of the constellation is given by 


We can express the minimum distance of an M - ary PAM system in terms of its £b av g 
by solving Equations 3.2-6 and 3.2-12 for £ p and £ g , respectively, and substituting the 
result in Equation 3.2-21. The resulting expression is 


The carrier-modulated PAM signal represented by Equation 3.2-8 is a double- 
sideband (DSB) signal and requires twice the channel bandwidth of the equivalent 
lowpass signal for transmission. Alternatively, we may use single-sideband (SSB) PAM, 
which has the representation (lower or upper sideband) 



(3.2-18) 

(3.2-19) 




(3.2-20) 



(3.2-21) 



(3.2-22) 


s m (t) = Re [A m ( g(t ) ± jg(t)) e jl7Tfct ] , 


m = 1, 2, . . . , M 


(3.2-23) 
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Signal 



(a) Baseband PAM signal 



FIGURE 3.2-2 

Example of (a) baseband and (b) carrier-modulated PAM signals. 

where g (f) is the Hilbert transform of g(t). Thus, the bandwidth of the SSB signal is 
one-half that of the DSB signal. 

A four-amplitude level baseband PAM signal is illustrated in Figure 3.2-2(a). The 
carrier- modulated version of the signal is shown in Figure 3.2-2(b). 

In the special case of M = 2, or binary signals, the PAM waveforms have the 
special property that ,V| (/) = — j 2 (0- Hence, these two signals have the same energy 
and a cross-correlation coefficient of — 1 . Such signals are called antipodal. This case 
is sometimes called binary antipodal signaling. 


3.2-2 Phase Modulation 

In digital phase modulation, the M signal waveforms are represented as 


s m (t) = Re 


giOei^eJ 2 ^' 


m = 1, 2, . . . , M 


= g(t) cos 


2jt 

27 rf c t + — (m - 1) 
( 2n 


V M 


= g(t) cos — (m — 1) cos27T f c t — g(f)sin — (m — 1) sin2n f c t 


2jt 


M 


(3.2-24) 
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where g(t) is the signal pulse shape and 6 m = 2jt(m — 1 )/M, m = 1, 2, . . . , M, is 
the M possible phases of the carrier that convey the transmitted information. Digital 
phase modulation is usually called phase-shift keying (PSK). We note that these signal 
waveforms have equal energy. From Equation 2.1-21, 


and therefore, 


£ — £ 
^avg — *^m 



C _ 

^bavg 


£ g 

2 log 2 M 


(3.2-25) 


(3.2-26) 


For this case, instead of £ avg and £\, mg we use the notation £ and £\,. 

Using the result of Example 2.1-1 , we note that g(t) cos 2ic f c T and g(t) sin 2 nf c t 
are orthogonal, and therefore 4>\{t) and fait) given as 


0t (0 = \l g-g(t) cos 2nf c t 


02 (0 = -y g-'g(t)sin2jtf c t 

can be used for expansion of s m (t), 1 < m < M, as 


(3.2-27) 

(3.2-28) 




2it 


M 


£e 


S m (t) = \ — cos — (m - 1) 01 (r) + \ — sin — (m - 1) 0 2 (r) (3.2-29) 


2it 


M 


therefore the signal space dimensionality is N = 2 and the resulting vector representa- 
tions are 



(3.2-30) 


Signal space diagrams for BPSK (binary PSK, M = 2), QPSK (quaternary PSK, 
M = 4), and 8-PSK are shown in Figure 3.2-3. We note that BPSK corresponds to 
one-dimensional signals, which are identical to binary PAM signals. These signaling 
schemes are special cases of binary antipodal signaling discussed earlier. 

As is the case with PAM, the mapping or assignment of k information bits to the 
M = 2 k possible phases may be done in a number of ways. The preferred assignment 
is Gray encoding, so that the most likely errors caused by noise will result in a single 
bit error in the k-bit symbol. 

The Euclidean distance between signal points is 

dmn = \/ S n ||“ 




1 — cos 


2tt 

— (m 



(3.2-31) 
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Signal space diagrams for BPSK, QPSK, 
and 8-PSK. 
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M= 2 
(BPSK) 



• 01 
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00 


• 10 

M 

= 4 
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010 

001 

• 


• 

110 


000 

111 


100 

• 

> 101 
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M= 8 

(Octal PSK) 


(QPSK) 

and the minimum distance corresponding to m 


n | = 1 is 


2tt \ 


dmin = \ So 1 — cos — = \ 2£„ sin" — 


M J 


7 r 


M 


(3.2-32) 


Solving Equation 3.2-26 for £„ and substituting the result in Equation 3.2-32 result in 


7 r 


dmin = 2 \l [ lo S2 M X sin “ ^ ) £b 


(3.2-33) 


For large values of M, we have sin ^ ~ and </ mm can be approximated by 


1 7T 2 log 2 AT 

M 2 




(3.2-34) 


A variant of four-phase PSK (QPSK), called ^ -QPSK, is obtained by introducing 
an additional 7r/4 phase shift in the carrier phase in each symbol interval. This phase 
shift facilitates symbol synchronization. 


3.2-3 Quadrature Amplitude Modulation 

The bandwidth efficiency of PAM/SSB can also be obtained by simultaneously impress- 
ing two separate k - bit symbols from the information sequence on two quadrature carriers 
cos 27zf c t and sin 2nf c t. The resulting modulation technique is called quadrature PAM 
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or QAM, and the corresponding signal waveforms may be expressed as 


s m (t) = Re \(A mi + j A mq )g(t)e j2nfct ] 

= A mi g{t ) cos 2n f c t - A mq g(t ) sin 2 nf c t, m 


(3.2-35) 

1,2,..., M 


where A, ru and A mq are the information-bearing signal amplitudes of the quadrature 
carriers and g(t) is the signal pulse. Alternatively, the QAM signal waveforms may be 
expressed as 


S m (t) = Re [r m e j6m e j2nfct ] 

= r m cos (2nf c t + 9 m ) 


(3.2-36) 


where r m = A 2 ni + Aj nq and 9 m = tan l (A mq /A mi ). From this expression, it is 
apparent that the QAM signal waveforms may be viewed as combined amplitude (r,„) 
and phase (9 m ) modulation. In fact, we may select any combination of M\ -level PAM and 
Mi-phase PSK to construct an M = MjMi combined PAM-PSK signal constellation. 
If Mi = 2" and Mi = 2 m , the combined PAM-PSK signal constellation results in the 
simultaneous transmission of m + n = log 2 M | M 2 binary digits occurring at a symbol 
rate R/(m + n). 

From Equation 3.2-35, it can be seen that, similar to the PSK case, <fi\(t) and 0 2 (/) 
given in Equations 3.2-27 and 3.2-28 can be used as an orthonormal basis for expansion 
of QAM signals. The dimensionality of the signal space for QAM is N = 2. Using this 
basis, we have 


(0 — A m i 


01 (0 + -A. 


mq 



which results in vector representations of the form 


— Qml i ''ml) 



(3.2-37) 


(3.2-38) 


and 

S m = \\s m \\\ 2 =^(A 2 mi + A 2 nq ) (3.2-39) 

Examples of signal space diagrams for combined PAM-PSK are shown in 
Figure 3.2-4, for M = 8 and M = 16. 

The Euclidean distance between any pair of signal vectors in QAM is 


d-mn \J 1 1 ^ m $ n || 

= J ~ [(A m i ~ A„i ) 2 + (A mq — A nq ) 2 ~\ 


(3.2-40) 
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M= 8 


M= 16 


FIGURE 3.2-4 

Examples of combined PAM-PSK constellations. 


In the special case where the signal amplitudes take the set of discrete values 
{(2m — 1 — M ), m = 1,2, ... , M\, the signal space diagram is rectangular, as shown 
in Figure 3.2-5. In this case, the Euclidean distance between adjacent points, i.e., the 
minimum distance, is 


tfmin — 



(3.2-41) 


which is the same result as for PAM. In the special case of a rectangular constellation 
with M = l 2kl , i.e., M = 4, 16, 64, 256, . . . , and with amplitudes of ±1, ±3, . . . , 
±(\/M — 1) on both directions, from Equation 3.2-39 we have 


I c ■>/ M \J M 

^ avg= ^yEE ( A m + A l) 

m = 1 n= 1 

2 M(M - 1) 

= x 

2 M 3 

_ M- 1 

3 g 


(3.2-42) 




M= 64 


M= 32 


M= 16 


M= 8 

f 1 

iM = 4 


i 
i 

• 




« — * — « — 


+ 

* 


+ 

* 


FIGURE 3.2-5 

Several signal space diagrams for rectangular 
QAM. 
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from which 


c M - 1 c 

£bavg — TT rr Og 


3 log 2 M 


Using Equation 3.2-41, we obtain 


timin — 


/ 6 log 2 M 
M - 1 




bavg 


(3.2-43) 


(3.2-44) 


Table 3.2-1 summarizes some basic properties of the modulation schemes dis- 
cussed above. In this table it is assumed that for PAM and QAM signaling, the ampli- 
tudes are ±1 , ± 3 , ... , ±(M — 1) and the QAM signaling has a rectangular \[M x \[M 
constellation. 

From the discussion of bandpass PAM, PSK, and QAM, it is clear that all these 
signaling schemes are of the general form 

Sm(t) = Re [A m g(t)e j2nf ' ! ] , m = 1 , 2, . . . , M (3.2-45) 

where A m is determined by the signaling scheme. For PAM, A m is real, generally equal 
to ±1, ±3, . . . , ±(M — 1), for M- ary PSK, A m is complex and equal to 
and finally for QAM, A m is a general complex number A m = A mi + jA mq . In this 
sense it is seen that these three signaling schemes belong to the same family, and 
PAM and PSK can be considered as special cases of QAM. In QAM signaling, both 
amplitude and phase carry information, whereas in PAM and PSK only amplitude 
or phase carries the information. Also note that in these schemes the dimensionality 
of the signal space is rather low (one for PAM and two for PSK and QAM) and is 
independent of the constellation size M. The structure of the modulator for this general 
class of signaling schemes is shown in Figure 3.2-6, where 0i (t ) and 02(0 are given by 
Equation 3.2-21. Note that the modulator consists of a vector mapper, which maps each 
of the M messages onto a constellation of size M, followed by a two-dimensional (or 
one-dimensional, in case of PAM) vector to signal mapper as was previously shown in 
Figure 2.2-2. 


<hV) 



FIGURE 3.2-6 

A general QAM modulator. 
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3.2^1 Multidimensional Signaling 

It is apparent from the discussion above that the digital modulation of the carrier 
amplitude and phase allows us to construct signal waveforms that correspond to two- 
dimensional vectors and signal space diagrams. If we wish to construct signal wave- 
forms corresponding to higher-dimensional vectors, we may use either the time domain 
or the frequency domain or both to increase the number of dimensions. Suppose we 
have /V -dimensional signal vectors. For any N, we may subdivide a time interval of 
length 7j = NT into N subintervals of length T = T\/N . In each subinterval of 
length T, we may use binary PAM (a one-dimensional signal) to transmit an element 
of the A-dimensional signal vector. Thus, the N time slots are used to transmit the 
A -dimensional signal vector. If N is even, a time slot of length T may be used to 
simultaneously transmit two components of the N -dimensional vector by modulating 
the amplitude of quadrature carriers independently by the corresponding components. 
In this manner, the N -dimensional signal vector is transmitted in \NT seconds (\N 
time slots). Alternatively, a frequency band of width N Af may be subdivided into N 
frequency slots each of width Af. An N -dimensional signal vector can be transmitted 
over the channel by simultaneously modulating the amplitude of N carriers, one in each 
of the N frequency slots. Care must be taken to provide sufficient frequency separation 
Af between successive carriers so that there is no cross-talk interference among the 
signals on the N carriers. If quadrature carriers are used in each frequency slot, the N- 
dimensional vector (even N ) may be transmitted in ' N frequency slots, thus reducing 
the channel bandwidth utilization by a factor of 2. More generally, we may use both 
the time and frequency domains jointly to transmit an N -dimensional signal vector. 
For example. Figure 3.2-7 illustrates a subdivision of the time and frequency axes into 
12 slots. Thus, an N = 12-dimensional signal vector may be transmitted by PAM or 
an N = 24-dimensional signal vector may be transmitted by use of two quadrature 
carriers (QAM) in each slot. 


Orthogonal Signaling 

Orthogonal signals are defined as a set of equal energy signals s m (t), 1 < m < M, such 
that 


( s m (t ), s n (t)) =0, in / n and 1 < m . n < M (3.2-46) 


/ 



FIGURE 3.2-7 

Subdivision of time and frequency axes into distinct slots. 
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With this definition it is clear that 

{ r m = yi 

0 m+n 1 <m,n<M (3.2-47) 

Obviously the signals are linearly independent and hence N = M. The orthonormal 
set {0/(0, 1 < i < N] given by 


0/(0 = 


Sj(t) 


1 <j<N 


(3.2-48) 


can be used as an orthonormal basis for representation of [s m (t), 1 < m < M\. The 
resulting vector representation of the signals will be 


Si = (V£,0,0, ...,0) 

s 2 = (0, Vs , 0, . . . , 0) 

(3.2—49) 


s M = (0, 0 ,0, yf£) 


From Equation 3.2^19 it is seen that for all m n we have 


dmn — y/2£ 

(3.2-50) 

and therefore, 


^min = 

(3.2-51) 

in all orthogonal signaling schemes. Using the relation 


II 

(3.2-52) 

log 2 M 

we conclude that 


^min = \j ^ ^§2 ^ ^b 

(3.2-53) 


Frequency-Shift Keying (FSK) As a special case of the construction of orthogonal 
signals, let us consider the construction of orthogonal signal waveforms that differ in 
frequency and are represented as 


s m (t) = Re [s m ,( t)e ]2nfct ] , 1 < m < M, 0 < t < T 



cos (2nf c t + 2nm Aft) 


(3.2-54) 


where 



1 < m < M, 0 < t < T 


Sml(t) — 


(3.2-55) 
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The coefficient is introduced to guarantee that each signal has an energy equal to 
£. This type of signaling, in which the messages are transmitted by signals that differ 
in frequency, is called frequency-shift keying (FSK). Note a major difference between 
FSK and QAM signals (of which ASK and PSK can be considered as special cases). In 
QAM signaling the lowpass equivalent of the signal is of the form A m g(t) where A m is 
a complex number. Therefore the sum of two lowpass equivalent signals corresponding 
to two different signals is of the general form of the lowpass equivalent of a QAM 
signal. In this sense, the sum of two QAM signals is another QAM signal. For this 
reason, ASK, PSK, and QAM are sometimes called linear modulation schemes. On the 
other hand, FSK signaling does not satisfy this property, and therefore it belongs to the 
class of nonlinear modulation schemes. 

By using Equation 2.1-26, it is clear that for this set of signals to be orthogonal, 
we need to have 


Re 



S,nl(t)s„l(t) dt 


= 0 


(3.2-56) 


for all m f n. But 

Mt),s nl (t)) = % [ T e^-^dt 
1 Jo 

= 2g sm(nT(m - n)Af) eJnT(m _ n)Af 
icT(m — n)Af 


(3.2-57) 


and 


2£ sin (nT(m — n)Af) 

Re[{s m i(t),s n ,(t))] = — — — — cos (n T (m 

jtT(m — n)Aj 

2£ sin (27T r(77t — n)Af) 

2jtT{m — n)Af 

= 2£sinc (2T(m — n)Af) 


n)Af) 


(3.2-58) 


From Equation 3.2-58 we observe that s m (t ) and s„(t) are orthogonal for all m 7^ n 
if and only if sine (2 Tint — n)Af) = 0 for all in f n. This is the case if Af = k/2T 
for some positive integer k. The minimum frequency separation Af that guarantees 
orthogonality is Af = 1/2 T. Note that A / = ^ is the minimum frequency separation 
that guarantees ( s m i(t ), s„i(t)) = 0, thus guaranteeing the orthogonality of the baseband, 
as well as the bandpass, frequency-modulated signals. 

Hadamard signals are orthogonal signals which are constructed from Hadamard 
matrices. Hadamard matrices H„ are 2" x 2" matrices for n = 1,2,... defined by the 
following recursive relation 

H 0 = [l] 

H n+\ 


H 

H 


(3.2-59) 
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With this definition 
= 

H 2 = 

h 3 = 


,vc have 

"1 r 

i -i 

"1111 

i-i i-i 
i i-i-i 
i-i-i i 
Mill 
i-i i-i 
i i-i-i 
i-i-i i 

1111 

i-i i-i 
i i-i-i 
i-i-i i 


iiir 
i-i i-i 
i i-i-i 
i-i-i i 
-i -i -i -i 
-i i-i i 
-i-i i i 
-i i i-i 


(3.2-60) 


Hadamard matrices are symmetric matrices whose rows (and, by symmetry, columns) 
are orthogonal. Using these matrices, we can generate orthogonal signals. For instance, 
using Hi would result in the set of signals 


Si = h/£ 

V£ 

V£ 

V£] 

S2 = [s/£ 

-Vs 

V£ 

-Vs\ 

*3 = [V£ 

V£ 

-VI 

-Vs] 

s 4 = [V£ 

-V£ 

-Vs 

V£] 


This set of orthogonal signals may be used to modulate any four-dimensional orthonor- 
mal basis {cf>j(t)} 4 j = | to generate signals of the form 

4 

s m (t ) = ^2 s mj(t>j(t), 1 < m < 4 (3.2-62) 

7=1 

Note that the energy in each signal is 4 £ , and each signal carries 2 bits of information, 
hence £/■, =2£. 


Biorthogonal Signaling 

A set of M biorthogonal signals can be constructed from \ M orthogonal signals by 
simply including the negatives of the orthogonal signals. Thus, we require N = \M 
dimensions for the construction of a set of M biorthogonal signals. Figure 3.2-8 illus- 
trates the biorthogonal signals for M = 4 and 6. We note that the correlation between 
any pair of waveforms is p = — 1 or 0. The corresponding distances are d = 2\ [£ or 
~J2£ , with the latter being the minimum distance. 
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FIGURE 3.2-8 

Signal space diagram for M = 4 and M = 6 biorthogonal signals. 


Simplex Signaling 

Suppose we have a set of M orthogonal waveforms {s,„(t)} or, equivalently, their vector 
representation {s m }. Their mean is 

1 

S = (3.2-63) 

m = 1 

Now, let us construct another set of M signals by subtracting the mean from each of 
the M orthogonal signals. Thus, 

s' m = s m —s, m = 1, 2, . . . , M (3.2-64) 


The effect of the subtraction is to translate the origin of the m orthogonal signals to 
the point s. The resulting signal waveforms are called simplex signals and have the 
following properties. First, the energy per waveform is 

\\s' m \\ 2 = \\s m -s\\ 2 


= £ 
= £ 





1 

M 


Second, the cross-correlation of any pair of signals is 



-1/M _ 1 

1 - 1/M “ ~ M - 1 


(3.2-65) 


(3.2-66) 


Hence, the set of simplex waveforms is equally correlated and requires less energy, by 
the factor 1 — 1/M, than the set of orthogonal waveforms. Since only the origin was 
translated, the distance between any pair of signal points is maintained at d = JlE, 
which is the same as the distance between any pair of orthogonal signals. Figure 3.2-9 
illustrates the simplex signals for M = 2,3, and 4. Note that the signal dimensionality 
is N = M - 1. 
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FIGURE 3.2-9 

Signal space diagrams for M - ary simplex 
signals. 
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Note that the class of orthogonal, biorthogonal, and simplex signals has many 
common properties. The signal space dimensionality in this class is highly dependent 
on the constellation size. This is in contrast to PAM, PSK, and QAM systems. Also, 
for fixed £/,, the minimum distance d mm in these systems increases with increasing M. 
This again is in sharp contrast to PAM, PSK, and QAM signaling. We will see later in 
Chapter 4 that similar contrasts in power and bandwidth efficiency exist between these 
two classes of signaling schemes. 

Signal Waveforms from Binary Codes 

A set of M signaling waveforms can be generated from a set of M binary code words 
of the form 


where c mj = 0 or 1 for all m and j. Each component of a code word is mapped into an 
elementary binary PSK waveform as follows: 


where T c = T/N and £ c = £/N. Thus, the M code words { c,„ } are mapped into a set 
of M waveforms {s m (t)}. The waveforms can be represented in vector form as 


C m — [Cml C m 2 * * * CmN ], ^ — 1,2,..., A 1 


(3.2-67) 



(3.2-68) 


S m — [AhI Sm2 ‘ ‘ ' ^ — 1,2,..., A/ 


(3.2-69) 
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FIGURE 3.2-10 

Signal space diagrams for signals generated from binary codes. 


where s mj = ± \J£ / N for all m and j . Also N is called the block length of the code, and 
it is the dimension of the M waveforms. We note that there are 2 N possible waveforms 
that can be constructed from the 2 N possible binary code words. We may select a subset 
of M < 2 N signal waveforms for transmission of the information. We also observe that 
the 2 N possible signal points correspond to the vertices of an A -dimensional hypercube 
with its center at the origin. Figure 3.2-10 illustrates the signal points in N = 2 and 3 
dimensions. Each of the M waveforms has energy £. The cross-correlation between any 
pair of waveforms depends on how we select the M waveforms from the 2 N possible 
waveforms. This topic is treated in detail in Chapters 7 and 8. Clearly, any adjacent 
signal points have a cross-correlation coefficient 


P = 


£(1-2 /N) 

£ 


and a corresponding distance of 


N -2 
N 


d mm = y/2£{\ - p ) = y/4 £/N 


(3.2-70) 


(3.2-71) 


The Hadamard signals described previously are special cases of signals based on 
codes. 


■ 3.3 

SIGNALING SCHEMES WITH MEMORY 

We have seen before that signaling schemes with memory can be best explained in 
terms of Markov chains and finite-state machines. The state transition and the outputs 
of the Markov chain are governed by 


mi = f m (Si- 1, h) 
Si = MSt-u h) 


(3.3-1) 


Chapter Three: Digital Modulation Schemes 


FIGURE 3.3-1 

Examples of baseband signals. 
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where Ii denotes the information sequence and mi is the index of the transmitted 
signal s m , (t). 

Figure 3.3-1 illustrates three different baseband signals and the corresponding data 
sequence. The first signal, called NRZ (non-return-to-zero), is the simplest. The binary 
information digit 1 is represented by a rectangular pulse of polarity A, and the binary 
digit 0 is represented by a rectangular pulse of polarity — A. Hence, the NRZ modulation 
is memoryless and is equivalent to a binary PAM or a binary PSK signal in a carrier- 
modulated system. The NRZI (non-retum-to-zero, inverted) signal is different from the 
NRZ signal in that transitions from one amplitude level to another occur only when 
a 1 is transmitted. The amplitude level remains unchanged when a 0 is transmitted. 
This type of signal encoding is called differential encoding. The encoding operation is 
described mathematically by the relation 

bk = a k ® b k -i (3.3-2) 


where { a k } is the binary information sequence into the encoder, { b k } is the output se- 
quence of the encoder, and © denotes addition modulo 2. When b k = 1 , the transmitted 
waveform is a rectangular pulse of amplitude A; and when b k = 0, the transmitted 
waveform is a rectangular pulse of amplitude — A. Hence, the output of the encoder is 
mapped into one of two waveforms in exactly the same manner as for the NRZ signal. 
In other words, NRZI signaling can be considered as a differential encoder followed 
by an NRZ signaling scheme. 

The existence of the differential encoder causes memory in NRZI signaling. Com- 
parison of Equations 3.3-2 and 3.3-1 indicates that b k can be considered as the state 
of the Markov chain. Since the information sequence is assumed to be binary, there are 
two states in the Markov chain, and the state transition diagram of the Markov chain is 
shown in Figure 3.3-2. The transition probabilities between states are determined by 
the probability of 0 and 1 generated by the source. If the source is equiprobable, all 
transition probabilities will be equal to ^ and 


P = 


-i 

2 

1 

-2 


1 - 

2 

1 

2 - 


(3.3-3) 


Using this P, we can obtain the steady-state probability distribution as 


P = 


(3.3-4) 
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FIGURE 3.3-2 


State transition diagram for NRZI signaling. 
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FIGURE 3.3-3 



The trellis diagram for NRZI signaling. 
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We will use the steady-state probabilities to determine the power spectral density of 
modulation schemes with memory later in this chapter. 

In general, if P [a t . = 1] = 1 — P [a^ = 0] = p, we have 


The steady-state probability distribution in this case is again given by Equation 3.3^4. 

Another way to display the memory introduced by the precoding operation is by 
means of a trellis diagram. The trellis diagram for the NRZI signal is illustrated in 
Figure 3.3-3. The trellis provides exactly the same information concerning the signal 
dependence as the state diagram, but also depicts a time evolution of the state transitions. 

3.3-1 Continuous-Phase Frequency-Shift Keying (CPFSK) 

In this section, we consider a class of digital modulation methods in which the phase of 
the signal is constrained to be continuous. This constraint results in a phase or frequency 
modulator that has memory. 

As seen from Equation 3.2-54, a conventional FSK signal is generated by shifting 
the carrier by an amount m A/, I < m < M. to reflect the digital information that is 
being transmitted. This type of FSK signal was described in Section 3.2^4, and it is mem- 
oryless. The switching from one frequency to another may be accomplished by having 
M = 2 k separate oscillators tuned to the desired frequencies and selecting one of the M 
frequencies according to the particular &-bit symbol that is to be transmitted in a signal 
interval of duration T = k/R seconds. However, such abrupt switching from one oscil- 
lator output to another in successive signaling intervals results in relatively large spectral 
side lobes outside of the main spectral band of the signal; consequently, this method re- 
quires a large frequency band for transmission of the signal. To avoid the use of signals 


P 1 - P 


(3.3-5) 
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having large spectral side lobes, the information-bearing signal frequency modulates 
a single carrier whose frequency is changed continuously. The resulting frequency- 
modulated signal is phase-continuous, and hence, it is called continuous -phase FSK 
(CPFSK). This type of FSK signal has memory because the phase of the carrier is con- 
strained to be continuous. To represent a CPFSK signal, we begin with a PAM signal 


d(t ) = 5>*(f - nT) 


(3.3-6) 


where {/„} denotes the sequence of amplitudes obtained by mapping &-bit blocks of 
binary digits from the information sequence { a n } into the amplitude levels ± 1 , ±3 , . . . , 
±(M — 1) and g{t) is a rectangular pulse of amplitude 1/2 T and duration T seconds. 
The signal d(t) is used to frequency-modulate the carrier. Consequently, the equivalent 
lowpass waveform v(t) is expressed as 


v(t) 


= J — e J [ 4jrr -ri Jl x d{r)dT+4>o\ 

T 


(3.3-7) 


where fi is the peak frequency deviation and </> 0 is the initial phase of the carrier. The 
carrier-modulated signal corresponding to Equation 3.3-7 may be expressed as 

flS 


s(t) = \t~f cos U-nfct + </>(f ; i) + M 


(3.3-8) 

where </>(t; /) represents the time-varying phase of the carrier, which is defined as 


<p(t\ I) = 4nT fa / d(T)dr 


= 47T T fd 


^2 I n g(r - nT) 


(3.3-9) 


dr 


(3.3-10) 


Note that, although d(t) contains discontinuities, the integral of d(t) is continuous. 
Flence, we have a continuous-phase signal. The phase of the carrier in the interval 
nT < t < (n+ 1 ) T is determined by integrating Equation 3.3-9. Thus, 

n - 1 

<j){t\ /) = 2nf d T ^ I k + 2nf d q(t -nT)I n 

k=—oo 

= 6 n + 2nhl n q{t — nT) 
where h, 9 n . and q(t) are defined as 

h = 2 f d T 

n - 1 

On =7th ^ I k 


k=—o o 


0 

q(t) = { tt 



(3.3-11) 

Ik 

(3.3-12) 

t < 0 

0 <t <T 
t > T 

(3.3-13) 
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We observe that 6„ represents the accumulation (memory) of all symbols up to time 
(n — 1)7'. The parameter h is called the modulation index. 


3.3-2 Continuous-Phase Modulation (CPM) 

When expressed in the form of Equation 3.3-10, CPFSK becomes a special case of 
a general class of continuous-phase modulated (CPM) signals in which the carrier 
phase is 


where {/j} is the sequence of M - ary information symbols selected from the alphabet 
±1, ±3, . . . , ±(M — 1), {hk} is a sequence of modulation indices, and q(t) is some 
normalized waveform shape. When li ^ = h for all k. the modulation index is fixed 
for all symbols. When the modulation index varies from one symbol to another, the 
signal is called multi-/; CPM. In such a case, the {/;/ c } are made to vary in a cyclic 
manner through a set of indices. The waveform q(t) may be represented in general as 
the integral of some pulse g(f), i.e., 


If g(t) = 0 for t > T, the signal is called full-response CPM. If git) ^ 0 for t > T , the 
modulated signal is called partial-response CPM. Figure 3.3-4 illustrates several pulse 
shapes for g(t) and the corresponding q(t). It is apparent that an infinite variety of CPM 
signals can be generated by choosing different pulse shapes g(t) and by varying the 
modulation index h and the alphabet size M. We note that the CPM signal has memory 
that is introduced through the phase continuity. 

Three popular pulse shapes are given in Table 3.3-1. FREC denotes a rectangular 
pulse of duration LT, where L is a positive integer. In this case, L = 1 results in a 
CPFSK signal, with the pulse as shown in Figure 3.3— 4(a). The FREC pulse for L = 2 
is shown in Figure 3.3— 4(c). FRC denotes a raised cosine pulse of duration LT. The 
FRC pulses corresponding to L = 1 and L = 2 are shown in Figure 3.3— 4(b) and (d), 
respectively. For L > 1, additional memory is introduced in the CPM signal by the 
pulse g{t). 

The third pulse given in Table 3.3-1 is called a Gaussian minimum- shift keying 
(GMSK) pulse with bandwidth parameter B, which represents the — 3-dB bandwidth 
of the Gaussian pulse. Figure 3.3^1(e) illustrates a set of GMSK pulses with time- 
bandwidth products BT ranging from 0.1 to 1. We observe that the pulse duration 
increases as the bandwidth of the pulse decreases, as expected. In practical applications, 
the pulse is usually truncated to some specified fixed duration. GMSK with BT = 0.3 
is used in the European digital cellular communication system, called GSM. From 
Figure 3.3— 4(e) we observe that when BT = 0.3, the GMSK pulse may be truncated 
at \t\ = 1.5 T with a relatively small error incurred for t > 1.5 T . 


n 


4>(t; I) = 2n ^2 I k h k q(t - kT), nT < t < (n + \)T (3.3-14) 


k=—oo 



(3.3-15) 
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FIGURE 3.3-4 

Pulse shapes for full-response CPM (a, b) and partial-response CPM (c, d), and GMSK (e). 


■ TABLE 3.3-1 

Some Commonly Used CPM Pulse Shapes 
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otherwise 
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FIGURE 3.3-5 

Phase trajectory for binary CPFSK. 


It is instructive to sketch the set of phase trajectories 4>{t\ /) generated by all 
possible values of the information sequence {/„}. For example, in the case of CPFSK 
with binary symbols I n = ±1, the set of phase trajectories beginning at time t = 0 is 
shown in Figure 3.3-5. For comparison, the phase trajectories for quaternary CPFSK 
are illustrated in Figure 3.3-6. 

These phase diagrams are called phase trees. We observe that the phase trees 
for CPFSK are piecewise linear as a consequence of the fact that the pulse g(t) is 
rectangular. Smoother phase trajectories and phase trees are obtained by using pulses 
that do not contain discontinuities, such as the class of raised cosine pulses. For example, 
a phase trajectory generated by the sequence (1, — 1, — 1, — 1, 1, 1, — 1, I ) for a partial- 
response, raised cosine pulse of length 3 T is illustrated in Figure 3.3-7. For comparison, 
the corresponding phase trajectory generated by CPFSK is also shown. 

The phase trees shown in these figures grow with time. However, the phase of the 
carrier is unique only in the range from </> = 0 to 0 = 27r or, equivalently, from <j> = —it 
to 0 = 7T. When the phase trajectories are plotted modulo 2n, say, in the range (— jr, n), 
the phase tree collapses into a structure called a phase trellis. To properly view the phase 
trellis diagram, we may plot the two quadrature components x,(f ; I) = cos (j)(t; /) and 
x q (t\ /) = sin (j)(t\ /) as functions of time. Thus, we generate a three-dimensional plot 
in which the quadrature components x, and x q appear on the surface of a cylinder of 
unit radius. For example, Figure 3.3-8 illustrates the phase trellis or phase cylinder 
obtained with binary modulation, a modulation index h = and a raised cosine pulse 
of length 3T. 

Simpler representations for the phase trajectories can be obtained by displaying 
only the terminal values of the signal phase at the time instants t = n T . In this case, 
we restrict the modulation index of the CPM signal to be rational. In particular, let us 
assume that h = m/p, where m and p are relatively prime integers. Then a full-response 
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FIGURE 3.3-6 

Phase trajectory for quaternary CPFSK. 


CPM signal at the time instants t = nT will have the terminal phase states 


. irm 2irm (p — l)jtm 

= <{ 0 , , , ■ ■ ■ , — 

P P P 


(3.3-16) 


when m is even and 


it hi litm (2 p — \)nm 

©5 = 1 i i ' ' ' ' 

P P P 


(3.3-17) 
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FIGURE 3.3-7 

Phase trajectories for binary CPFSK (dashed) and binary, partial-response CPM based on 
raised cosine pulse of length 3 T (solid). [Source: Sundberg (1986), © 1986 IEEE] 



FIGURE 3.3-8 

Phase cylinder for binary CPM with h = \ and a raised 
cosine pulse of length 3T . [Source: Sundberg (1986), 

© 1986 IEEE] 


when m is odd. Hence, there are p terminal phase states when m is even and 2 p states 
when m is odd. On the other hand, when the pulse shape extends over L symbol intervals 
(partial-response CPM), the number of phase states may increase up to a maximum of 
S t , where 


S t = 


pM L ~ l 
2 pM L ~ l 


even m 
odd m 


(3.3-18) 


where M is the alphabet size. For example, the binary CPFSK signal (full-response, 
rectangular pulse) with h = \ has S, = 4 (terminal) phase states. The state trellis for 
this signal is illustrated in Figure 3.3-9. We emphasize that the phase transitions from 
one state to another are not true phase trajectories. They represent phase transitions for 
the (terminal) states at the time instants t = nT. 

An alternative representation to the state trellis is the state diagram, which also 
illustrates the state transitions at the time instants t = nT. This is an even more 
compact representation of the CPM signal characteristics. Only the possible (terminal) 
phase states and their transitions are displayed in the state diagram. Time does not 
appear explicitly as a variable. For example, the state diagram for the CPFSK signal 
with h = \ is shown in Figure 3.3-10. 
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FIGURE 3.3-9 

State trellis for binary CPFSK with h = 


-l 



FIGURE 3.3-10 

State diagram for binary CPFSK with h 


l 

2 ' 


Minimum-Shift Keying (MSK) 

MSK is a special form of binary CPFSK (and, therefore, CPM) in which the modulation 
index h = \ and g(t) is a rectangular pulse of duration T . The phase of the carrier in 
the interval nT < t < (n + 1)7’ is 


n— 1 


0(t; I) = -n ^2 h + nI„q(t — nT) 


lc=—oo 


— @n H - ^ ^ In 


(3.3-19) 


t — nT 


nT < t < (n + l)T 


and the modulated carrier signal is 


s(t ) = A cos 
= A cos 


1 

2nf c t + e„ + -Jf/,, 


t — nT 
T 

2n ( fc + 2f In ) 1 ~ \ nlZl n + 


nT < t < (n + l)T 

(3.3-20) 


Equation 3.3-20 indicates that the binary CPFSK signal can be expressed as a 
sinusoid having one of two possible frequencies in the interval nT < t < (n + l)T. If 
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we define these frequencies as 


/t 

fi 


fc- 


fc + 


1 

4 T 

1 

4T 


(3.3-21) 


then the binary CPFSK signal given by Equation 3.3-20 may be written in the form 


Sj(t) = A cos 


1 

2nfit + 9„ + -nn(—l) 



i = 1,2 


(3.3-22) 


which represents an FSK signal with frequency separation of A / = f 2 — fi = I /2T . 
From the discussion following Equation 3.2-58 we recall that A/ = 1 /2 T is the mini- 
mum frequency separation that is necessary to ensure the orthogonality of signals s\(t) 
and 52(0 over a signaling interval of length T . This explains why binary CPFSK with 
h = j is called minimum-shift keying (MSK). The phase in the nth signaling interval is 
the phase state of the signal that results in phase continuity between adjacent intervals. 


Offset QPSK (OQPSK) 

Consider the QPSK system with constellation shown in Figure 3.3-1 1. In this system 
each 2 information bits is mapped into one of the constellation points. The constellation 
and one possible mapping of bit sequences of length 2 are shown in Figure 3.3-1 1 . 

Now assume we are interested in transmitting the binary sequence 1 10001 1 1. To 
do this, we can split this sequence into binary sequences 1 1, 00, 01, and 1 1 and transmit 
the corresponding points in the constellation. The first bit in each binary sequence 
determines the in-phase (/) component of the baseband signal with a duration 2Tb, and 
the second bit determines the quadrature (Q) component of it, again of duration 27),. 
The in-phase and quadrature components for this bit sequence are shown in Figure 3.3- 
12. Note that changes can occur only at even multiples of 7),, and there are instances at 
which both 7 and Q components change simultaneously, resulting in a change of 1 80° 
in the phase, for instance, at t = 27), in Figure 3.3-12. The possible phase transitions 
for QPSK signals, that can occur only at time instances of the form nT\„ where n is 
even, are shown in Figure 3.3-13. 
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FIGURE 3.3-11 

A possible mapping for QPSK signal. 


M = 4 


Chapter Three: Digital Modulation Schemes 


125 


d k (t) 


d(t \d x d 5 . d 6 d 1 

^2 ; d 3 d^ 


T b 2 T b lT b 4T b 5 T b 6T b 1T„ 8 T b 


d,(t) 


d Q (t) 


do 



t 



t 


FIGURE 3.3-12 

The in-phase and quadrature components for QPSK. 



FIGURE 3.3-13 

Possible phase transitions in QPSK signaling. 


To prevent 180° phase changes that cause abrupt changes in the signal, resulting 
in large spectral side lobes, a version of QPSK, known as offset QPSK (OQPSK), 
or staggered QPSK (SQPSK), is introduced. In OQPSK, the in-phase and quadrature 
components of the standard QPSK are misaligned by 7*. The in-phase and quadrature 
components for the sequence 1 10001 1 1 are shown in Figure 3.3-14. Misalignment of 
the in-phase and quadrature components prevents both components changing at the 
same time and thus prevents phase transitions of 180°. This reduces the abrupt jumps 
in the modulated signal. The absence of 180° phase jump is, however, offset by more 
frequent ±90° phase shifts. The overall effect is that, as we will see later, standard 
QPSK and OQPSK have the same power spectral density. The phase transition diagram 
for OQPSK is shown in Figure 3.3-15. 

The OQPSK signal can be written as 


sit) = A ( ^2 hng(t — 2nT) j cos2nf c t 

-OO / 

oo \ 

! 2n+ig(t - 2nT - T ) ) sin 2nf c t 


\n =— oo 
/ oo 


+ 


(3.3-23) 
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FIGURE 3.3-14 

The in-phase and quadrature components for OQPSK signaling. 



FIGURE 3.3-15 

Phase transition diagram for OQPSK signaling. 


with the lowpass equivalent of 


si{t) = A 


OO 

Y “ 2nT "> 

_n=—oo 


OO 


j 


hn+igit -2nT - T) 


(3.3-24) 


MSK may also be represented as a form of OQPSK. Specifically, we may express 
(see Problem 3.26 and Example 3.3-1) the equivalent lowpass digitally modulated 
MSK signal in the form of Equation 3.3-24 with 


git) = 


{ sin 


0<t<2T 

otherwise 


(3.3-25) 


Figure 3.3-16 illustrates the representation of an MSK signal as two staggered 
quadrature-modulated binary PSK signals. The corresponding sum of the two quadra- 
ture signals is a constant-amplitude, frequency-modulated signal. 

It is also interesting to compare the waveforms for MSK with offset QPSK in which 
the pulse g(t) is rectangular for 0 < t < 2 T, and with conventional QPSK in which the 
pulse g(t ) is rectangular for 0 < t < 2 T . Clearly, all three of the modulation methods 
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FIGURE 3.3-16 

Representation of MSK as an OQPSK signal with 
a sinusoidal envelope. 





0 T IT 3T AT 5 T 6 T IT 
(c) MSK signal [sum of (a) and (6)] 


result in identical data rates. The MSK signal has continuous phase; therefore, there 
exist no jumps in its waveform. However, since it is essentially a frequency modulation 
system, there are jumps in its instantaneous frequency. The offset QPSK signal with 
a rectangular pulse is basically two binary PSK signals for which the phase transi- 
tions are staggered in time by T seconds. Thus, the signal contains phase jumps of 
±90° that may occur as often as every T seconds. OQPSK is a signaling scheme with 
constant frequency, but there exist jumps in its waveform. On the other hand, the con- 
ventional four-phase PSK signal with constant amplitude will contain phase jumps of 
±180° or ±90° every 2 T seconds. An illustration of these three signal types is given in 
Figure 3.3-17. 

QPSK signaling with rectangular pulses has constant envelope, but in practice 
filtered pulse shapes like the raised cosine signal are preferred and are more widely 
employed. When filtered pulse shapes are used, the QPSK signal will not be a constant- 
envelope modulation scheme, and the 180° phase shifts cause the envelope to pass 
through zero. Nonconstant envelope signals are not desirable particularly when used 
with nonlinear devices such as class C amplifiers or TWTs. In such cases OQPSK is a 
useful alternative to QPSK. 

In MSK the phase is continuous — since it is a special case of CPFSK — but the 
frequency has jumps in it. If these jumps are smoothed, the spectrum will be more com- 
pact. GMSK signaling discussed earlier in this chapter and summarized in Table 3.3-1 
is a signaling scheme that addresses this problem by shaping the lowpass binary signal 
before being applied to the MSK modulator and therefore results in smoother transi- 
tions in frequency between signaling intervals. This results in more compact spectral 
characteristics. The baseband signal is shaped in GMSK, but since the shaping occurs 
before modulation, the resulting modulated signal will be of constant envelope. 
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FIGURE 3.3-17 

MSK, OQPSK, and QPSK signals. 

Linear Representation of CPM Signals 

As described above, CPM is a nonlinear modulation technique with memory. However, 
CPM may also be represented as a linear superposition of signal waveforms. Such a 
representation provides an alternative method for generating the modulated signal at the 
transmitter and/or demodulating the signal at the receiver. Following the development 
originally given by Laurent ( 1 986), we demonstrate that binary CPM may be represented 
by a linear superposition of a finite number of amplitude-modulated pulses, provided 
that the pulse g(t ) is of finite duration LT, where T is the bit interval. We begin with 
the equivalent lowpass representation of CPM, which is 

nT < t < (n + l)T (3.3-26) 


nT < t < (n + 1 )T 

k=—oo 

n—L n 

= nh ^ 4 + lith ^ — kT) 

k=—oo k=n-L -\- 1 



where 

n 

<p(t\ /) = 2nh ^ 4g(f — kT), 


(3.3-27) 
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and q(t) is the integral of the pulse g(t), as previously defined in Equation 3.3-15. The 
exponential term exp [j'0(t; /)] may be expressed as 


n—L 


L— 1 


exp [j(J){t\ /)] = exp jith Y^ I k ]^[ exp [j2ithl n - k q [t — (n — k)T]} (3.3-28) 


k=—o o 


k = 0 


Note that the first term on the right-hand side of Equation 3.3-28 represents the cu- 
mulative phase up to the information symbol I„-l, and the second term consists of a 
product of L phase terms. Assuming that the modulation index h is not an integer and 
the data symbols are binary, i.e., A = ±1, the Mil phase term may be expressed as 

sin ith 

exp {j2nhl n - k q [t - (n - k)T]} = exp {j2ithl„~ k q [t - (n - k)T]} 

smith 

sin{ith — 2ithq\t — (n — k)]T] 
smith 

sm{2ithq\t — (n — k)T]} 


+ exp(jithl n - k ) 

It is convenient to define the signal pulse so(t ) as 

0 <t < LT 


smith 


sin 2 nhq(t) 
sin7r/z 

So(/) = ^ sm[jch-2Tchq(t-LT )] 
sin7r h 


Then 


LT <t < 2 LT 

0 otherwise 

( n—L \ L - 1 

jith Y h j + ( k + L - 

k=—o o / k = 0 

+ exp(jithl n - k )s 0 [t -(k- n)T]} 


(3.3-29) 


(3.3-30) 


(3.3-31) 


By performing the multiplication over the L terms in the product, we obtain a sum 
of 2 L terms, where 2 L ~ 1 terms are distinct and the other 2 i_1 terms are time-shifted 
versions of the distinct terms. The final result may be expressed as 


2 i - 1 -l 


(3.3-32) 


exp [j<p{f, /)] = Y 5Z e jnhAk - n c k (t - nT ) 

n k=t) 

where the pulses c k (t), for 0 < k < 2 L ~ X — 1, are defined as 

L— 1 

c k (t) = s 0 (t ) TT s 0 [t+(n+La k n )T], 0 <t<T xmin[L(2— a k „)— n] (3.3-33) 

’ n 

n = 1 

and each pulse is weighted by a complex coefficient exp where 


L— 1 


Ak,n — ^ ^ Im ^ ^ In—m&k,m 

m= 1 


m =— oo 


(3.3-34) 
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and the {«/.,, = 0 or 1} are the coefficients in the binary representation of the index k, 
i.e., 


L - 1 

k = ^2 m ~ l a k , m , k = 0, 1 2 i_1 — 1 (3.3-35) 

m= 1 


Thus, the binary CPM signal is expressed as a weighted sum of 2 L 1 real-valued pulses 
{<*(*)}■ 

In this representation of CPM as a superposition of amplitude-modulated pulses, 
the pulse co(0 is the most important component, because its duration is the longest 
and it contains the most significant part of the signal energy. Consequently, a simple 
approximation to a CPM signal is a partial-response PAM signal having co(t) as the 
basic pulse shape. 

The focus for the above development was binary CPM. A representation of M - ary 
CPM as a superposition of PAM waveforms has been described by Mengali and Morelli 
(1995). 

example 3.3-1. As a special case, let us consider the MSK signal, for which h = | 

and g(t) is a rectangular pulse of duration T. In this case, 

n— 1 

<P(t\ /) = y Ik+ Jtln< k (t ~ nT ' ) 

k=—o o 

7T ft — nT\ 

= 0 n +—I n i — — — J, nT < t < (n + \)T 


and 


exp[j<p(t; /)] = b n c 0 (t - nT) 

n 


where 


and 


co(0 



0 < t < 2T 
otherwise 


Ij — gi 1 r^o.n/2 _ e jn(,9„+I n )/2 


The complex-valued modified data sequence {b,,} may be expressed recursively as 

k/i — j b n I k\ 

so that b n alternates in taking real and imaginary values. By separating the real and the 
imaginary components, we obtain the equivalent lowpass signal representation given 
by Equations 3.3-24 and 3.3-25. 
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■ 3.4 


POWER SPECTRUM OF DIGITALLY MODULATED SIGNALS 

In this section we study the power spectral density of digitally modulated signals. 
The information about the power spectral density helps us determine the required 
transmission bandwidth of these modulation schemes and their bandwidth efficiency. 
We start by considering a general modulation scheme with memory in which the current 
transmitted signal can depend on the entire history of the information sequence and 
then specialize this general formulation to the cases where the modulation system has a 
finite memory, the case where the modulation is linear, and when the modulated signal 
can be determined by the state of a Markov chain. We conclude this section with the 
spectral characteristics of CPM and CPFSK signals. 

3.4-1 Power Spectral Density of a Digitally Modulated Signal with Memory 

Here we assume that the bandpass modulated signal is denoted by v(t) with a lowpass 
equivalent signal of the form 


Here si(t; /„) e [su(t), $ 2 lit), ■ ■■ , Tw(01 is one of the possible M lowpass equiva- 
lent signals determined by the information sequence up to time n, denoted by I „ = 
(. . . , /„_ 2 , /„_ i , /„). We assume that /„ is stationary process. Our goal here is to deter- 
mine the power spectral density of v(t). This is done by first deriving the power spectral 
density of vi(t) and using Equation 2.9-14 to obtain the power spectral density of v(t). 

We first determine the autocorrelation function of v/(t). 


Changing t to t + T does not change the mean and the autocorrelation function of i >/(f), 
hence u/(f) is a cyclostationary process; to determine its power spectral density, we 
have to average R Vl (t + r, t) over one period T . We have (with a change of variable of 
k = n — m) 


OO 



(3.4-1) 


R v ,(t + r, r) = E [n/(f + r)v*(t)] 


= H E [s,(f + r - nT\ I n )s*(t - mT\ I m )\ 


(3.4-2) 





(3.4-3) 
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where in (a) we have introduced a change of variable of the form u = t —mT and we 
have used the fact that the Markov chain is in the steady state and the input process {/„} 
is stationary. Defining 


/ OO 

E [si(t + r; I k )sf(t; /o)] dt 

-OO 

we can write Equation 3.4-3 as 

Y OO 

Sk(r - kT) 

k — — oo 


(3.4-4) 


(3.4-5) 


The power spectral density of v/(t), which is the Fourier transform of R Vl ( r), is 
therefore given by 




1 

T 

1 

T 


& 


J^gk( r - kT) 


E G k (f)e~^ kfT 


(3.4-6) 


where G k (f ) denotes the Fourier transform of g k (r ). We can also express G k (f ) in the 
following form: 


G k (f) = dT 


/ OO 

E [s,,(t + r; / 0 )] dt 

-oo 

/ oo poo 

/ E [si(t + r; I k )s*(t; /o)] e~ j2n f r dt d x 

-oo J —co 

/ oo poo 

/ s,(t + r; I k )e~ j2nfit+T) sf(f, I 0 )e j2nft dtdx 

-oo J —oo 


(3.4-7) 


= E 

= E [S,(f- Ik)S*(f; I 0 )] 


where 5/(/; I k ) and Si ( f ; /o) are Fourier transforms of s/(t: I k ) and s/(t: I o), 
respectively. 

From Equation 3.4-7, we conclude that Go(/) = E [| Si(f\ Z 0 )| 2 ] is real, and 
G- k (f) = G|(/) for k > 1. If we define 

G' k {f) = G k (f) - G 0 (/) (3.4-8) 


we can readily see that 


<?'_*(/) = G*{f) 
Gq(/) = 0 


(3.4-9) 
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Equation 3.4-6 can be written as 


I OO I oo 

S w (/)=- £ (G k (f)-G 0 (f))e- j2 * kfT + - J2 G 0 (f)e- j2nkfT 


k =— oo 
oo 


k =— oo 


= G' k (f)e-^ + ^ 2 E Go(f)8(f-± 


T / ^ ■ t 2 

k——oo k =— oo 


2 

= - Re 
T 


E G l-(/) e 

U=i 

= l S«(/) + 5if ) (/) 


-jlnkfT 


1 


k=—oo 


+ r- E o„(-)i(/-- 


u/ w 7 ' w/ 

where we have used Equation 3.4-9 and the well-known relation 


E 


JlnkfT 


1 


= ? E S (f~T 


k — — oo 

S^\f) and S^Xf), defined by 


k — — oo 


S ( :\f)=~ Re 


U=i 


•y'23r*/r 


s r</)= ? 2 e g »(t) s (a-t 


k=—oo 


(3.4-10) 


(3.4-11) 


(3.4-12) 


represent the continuous and the discrete components of the power spectral density 

of Vl(t). 


3.4-2 Power Spectral Density of Linearly Modulated Signals 

In linearly modulated signals, which include ASK, PSK, and QAM as special cases, 
the lowpass equivalent of the modulated signal is of the form 

OO 

v,(t)= E I«g(t-nT) (3.4-13) 

11 = — OO 

where {/„} is the stationary information sequence and g(t) is the basic modulation 
pulse. Comparing Equations 3.4-13 and 3.4-1, we have 

Slit, I n ) = Ingif) (3.4-14) 

from which 

Gkif) = E [Siif ; I k )S*if- /„)] 

= E[4/*|G(/)| 2 ] 

= R,(k)\G(f)\ 2 


(3.4-15) 
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where Ri(k) represents the autocorrelation function of the information sequence {/„}, 
and G(/) is the Fourier transform of g(t). Using Equation 3.4-15 in Equation 3.4—6 
yields 

■ OO 

«$„,(/)=- |G(/)| 2 X] Ri(k)e~ j2,lkfT 

k =~°° (3.4-16) 

= i|G(/)| 2 5 7 (/) 

where 

OO 

£/(/)= 51 Ri(k)e- j2nkfT (3.4-17) 


represents the power spectral density of the discrete-time random process {/„}. 

Note that two factors determine the shape of the power spectral density as given in 
Equation 3.4-16. The first factor is the shape of the basic pulse used for modulation. 
The shape of this pulse obviously has an important impact on the power spectral density 
of the modulated signal. Smoother pulses result in more compact power spectral den- 
sities. Another factor that affects the power spectral density of the modulated signal is 
the power spectral density of the information sequence {/„} which is determined by the 
correlation properties of the information sequence. One method to control the power 
spectral density of the modulated signal is through controlling the correlation proper- 
ties of the information sequence by passing it through an invertible linear filter prior 
to modulation. This linear filter controls the correlation properties of the modulated 
signals, and since it is invertible, the original information sequence can be retrieved 
from it. This technique is called spectral shaping by precoding. 

For instance, we can employ a precoding of the form J n = /„ + a/„_i, and by 
changing the value of a, we can control the power spectral density of the resulting 
modulated waveform. In general, we can introduce a memory of length L and define a 
precoding of the form 

L 

Jn=Y, a kIn-k (3.4-18) 

k = 0 

and then generate the modulated waveform 

OO 

v,(t)= J2 Jkg(t-kT) (3.4-19) 

k — — oo 


Since the precoding operation is a linear operation, the resulting power spectral 
density is of the form 


S v ,(f) = -\G{f)\ 2 


0 -j2nkfT 


a k e 


k=0 


5/(/) 


(3.4-20) 


Changing o^’s controls the power spectral density. 


example 3.4-1. In a binary communication system /„ = ±1 with equal probability, 
and the /„ ’s are independent. This information stream linearly modulates a basic pulse 
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of the form 

* w = n (?) 

to generate 

OO 

v(t) = Y I k g(t - k ;T) 

k——oo 

The power spectral density of the modulated signal will be of the form 
S v (f)= ^\Tsmc(Tf)\ 2 S,(f) 

To determine Sj(f), we need to find R/(k) = E [l n+k I*\ . By independence of the {/„} 
sequence we have 

s,(t) = l E[|,|! ] = 1 k = ° 

\E [/„+,] E [/;] =0 4#o 


and hence 

OO 

S,(f) = Y Ri(k)e~ j27lkfT = 1 

k=—oo 


Thus, 

A precoding of the form 


S v (f ) = T sinc 2 (r/) 
Jn = In ~ I” & In— 1 


where a is real would result in a power spectral density of the form 
S v (f) = rsinc 2 (r/) 1 1 + ae~ j2nfT \ 2 


or 

S v (f) — T sine 2 {T f) (l + a 2 + 2 a cos(27 r/T)) 

Choosing a = 1 would result in a power spectral density that has a null at frequency 
/ = 4f- Note that this spectral null is independent of the shape of the basic pulse g(t); 
that is, any other g(t ) having a precoding of the form J n = I„ + /„_| will result in a 
spectral null at / = ^ . 


3.4-3 Power Spectral Density of Digitally Modulated Signals 
with Finite Memory 

We now focus on a special case where the data sequence {/„ } is such that /„ and I n+ k are 
independent for \k\ > K, where A' is a positive integer representing the memory in the 
information sequence. With this assumption, S/(/; //.) and S*( f; I o) are independent 
for Ac > K, and by stationarity have equal expected values. Therefore, 

G k (f) = |E [Si(f ; 7„)]| 2 = G K+ i(f), for \k\ > K 


(3.4-21) 
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Obviously, Gx+\{f) is real. Let us define 

G'Kf) = G k (f) - G K+l (f ) = G,(/) - |E [S,(/; / 0 )]| 2 (3.4-22) 

It is clear that G k (f) = 0 for |£| > K and G"_ k (f) = G" k {f). Also note that 

G"(/) = G 0 (/> - G K+l (f) = E [|5/(/ ; / 0 )| 2 ] - |E[S,(/; / 0 )]| 2 = VAR[S,(/; 7 0 )] 

(3.4-23) 

In this case we can write Equation 3.4-6 in the following form: 


I OO | oo 

Sv,(f)=- £ (G,(/)-G^ +1 (/)) e -^ /r + - £ Gic+i(f) e~ J2nk f T 


k =— oo 
K 


k=—o o 


= 7 £ Gl'(/) e -™ r + ^ £ G/f +1 (/)<5 ( / — — 


A--X 

~ VAR [£/(/ ; / 0 )] + ^ Re 


k=—oo 

K 


£G£(/) e -™ r 


U=i 


(3.4-24) 


A=— oo 


f2 Gk + 1 ( T ) 5 ( f ~ T 


= 5«(/) + 5W(/) 


The continuous and discrete components of the power spectral density in this case 
can be expressed as 


tf\f) 

Sf(/) 


1 

T 


VAR[S,(/;/ 0 )] + 



£c"(/) e -^ r 


_A=1 



(3.4-25) 


Note that if G a'^ i (f) = 0 for k = 0, ±1, ±2, . . . , the discrete component of the 
power spectrum vanishes. Since Gk+ i (/) = |E [5/(/; /o)]| 2 , having E |>/(t; / 0 )] = 0 
guarantees a continuous power spectral density with no discrete components. 


3.4-4 Power Spectral Density of Modulation Schemes 
with a Markov Structure 

The power spectral density of modulation schemes with memory was derived in Equa- 
tions 3.4-6, 3.4-7, and 3.4-10. These results can be generalized to the general class of 
modulation systems that can be described in terms of a Markov chain. This is done by 
defining 


In = (* 5 / 1—1 , In) 


( 3 . 4 - 26 ) 
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where S„_ 1 e (1, 2, . . . , K) denotes the state of the modulator at time n — 1 and /„ is 
the nth output of the information source. With the assumption that the Markov chain is 
homogeneous, the source is stationary, and the Markov chain has achieved its steady- 
state probabilities, the results of Section 3.4—1 apply and the power spectral density 
can be derived. 

In the particular case where the signals generated by the modulator are determined 
by the state of the Markov chain, the derivation becomes simpler. Let us assume that 
the Markov chain that determines signal generation has a probability transition matrix 
denoted by P . Let us further assume that the number of states is K and the signal 
generated when the modulator is in state/, 1 < i < K, is denoted by s,- /(/). The steady- 
state probabilities of the states of the Markov chain are denoted by pi , 1 < / < K, and 
elements of the matrix P are denoted by Pu> 1 < iJ < K. With these assumptions 
the results of Section 3.4-1 can be applied, and the power spectral density may be 
expressed in the general form (see Taus worth and Welch, 1961) 


^ OO 

s v (f) = Yi ^2 


Em- ^ 


;=i 


\ i K 

/-J +tEp.-iW 


i=i 



K K 

EEMvC/^c/w/) 


i=i j = i 


(3.4-27) 


where Suif ) is the Fourier transform of the signal waveform suit) and 

K 

Suit) = suit) - E PkSkiit) (3.4-28) 

*=i 

Pjjif) is the Fourier transform of n-step state transition probabilities Pi jin), defined as 

OO 

Pijif) = E PiM)e- j2nnfT (3.4-29) 

n = 1 

and K is the number of states of the modulator. The term Pi jin) denotes the probability 
that the signal sj(t) is transmitted n signaling intervals after the transmission of s,(0- 
Hence, {Pi jin)} are the transition probabilities in the transition probability matrix P n . 
Note that P, 7 ( 1 ) = P :J , the (/, / )tli entry in P. 

When there is no memory in the modulation method, the signal waveform transmit- 
ted on each signaling interval is independent of the waveforms transmitted in previous 
signaling intervals. The power density spectrum of the resultant signal may still be ex- 
pressed in the form of Equation 3.4-27, if the transition probability matrix is replaced by 


P = 


~P\ 

P 2 ' ' ' 

Pk 

Pi 

P 2 ' ' ' 

Pk 

-Pi 

P 2 ' ' ' 

Pk. 

pn = 

= P for all 

n > ] 


(3.4-30) 


expression for the power density spectrum becomes a function of the stationary state 
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probabilities { p , } only, and hence it reduces to the simpler form 

, 2 


1 


«.</> = fi E 


n=—o o 
K 


Em, ; 


i=i 






i=i 
K K 


- r EEw Re [Si/(/)Sy/(/) 
;=1 ; = 1 
i<7 


(3.4-31) 


We observe that when 

A” ✓ \ 

(3-4-32) 

the discrete component of the power spectral density in Equation 3.4-31 vanishes. This 
condition is usually imposed in the design of digital communication systems and is 
easily satisfied by an appropriate choice of signaling waveforms (Problem 3.34). 


example 3.4-2. Let us determine the power density spectrum of the baseband- 
modulated NRZ signal described in Section 3.3. The NRZ signal is characterized by 
the two waveforms s\(t) = g(t ) and S 2 (t ) = —g(t), where g(t) is a rectangular pulse 
of amplitude A. For K — 2, Equation 3.4-31 reduces to 


C.</>=^=yl4 E Kf)K/ 


4n(l — p) 7 
\ \G(f)\ 2 


(3.4-33) 


where 


|G(/)| 2 = (4T) 2 sinc 2 (/T) 


Observe that when p = i, the line spectrum vanishes and S v (f) reduces to 


S v (f) = ~\G(f)\ 2 


(3.4-34) 


example 3.4-3. The NRZI signal is characterized by the transition probability matrix 

"i i" 

p = 2 2 

1 1 
-2 2 - 

Notice that in this case P" = P for all n > 1. Hence, the special form for the 
power density spectrum given by Equation 3.4-33 applies to this modulation format 
as well. Consequently, the power density spectrum for the NRZI signal is identical to 
the spectrum of the NRZ signal. 


3.4-5 Power Spectral Densities of CPFSK and CPM Signals 

In this section, we derive the power density spectrum for the class of constant- amplitude 
CPM signals described in Sections 3.3-1 and 3.3-2. We begin by computing the auto- 
correlation function and its Fourier transform. 
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The constant-amplitude CPM signal is expressed as 

s(t; I) = A cos[2:r/ c f + </>(f ; /)] (3.4-35) 

where 

OO 

0(f; /) = 27i7? Y hq(t-kT ) (3.4-36) 

k= — OQ 

Each symbol in the sequence {/„} can take one of the M values {±1, ±3,..., 
±(M — 1)}. These symbols are statistically independent and identically distributed 
with prior probabilities 

P n = P(I k = n), n = ±1, ±3, . . . , ±(M — 1) (3.4-37) 

where Pn = 1- The pulse g(t) = q'(t) is zero outside of the interval [0, LT\. 
q(t ) = 0, t < 0, and q(t) = ^ for t > LT. 

The autocorrelation function of the equivalent lowpass signal 

vi(t) = e m;I) (3.4-38) 


is 


R Vl (t + r;t) = E 


exp ( jlizh h[q(t + T — kT) — q(t — kT)] 


k =— oo 


(3.4-39) 


First, we express the sum in the exponent as a product of exponents. The result is 


R v ,(t + r; t) = E 


[ exp { j2nhl k [q(t + r — kT) — q(t — kT)]} 


.k =— oo 


(3.4-40) 


Next, we perform the expectation over the data symbols { I ^}. Since these symbols are 
statistically independent, we obtain 

no ( 


R v ,(t + r-,t) = 


k — — oo 


M - 1 


Y^ Pn e.y^{j2nhn[q{t + r — kT) — q(t — kT)]} 


„=-(M- 1) 


\"=- 
\ n 


odd 


(3.4-41) 


Finally, the average autocorrelation function is 


R vi(r) = -!p[ R Vl (t + r; t)dt (3.4-42) 

r Jo 

Although Equation 3.4-41 implies that there are an infinite number of factors in 
the product, the pulse g(t) = q'{t) = 0 for t <0 and t > LT, and q{t) = 0 for / < 0. 
Consequently only a finite number of terms in the product have nonzero exponents. 
Thus Equation 3.44-1 can be simplified considerably. In addition, if we let r = £ +mT, 

where 0 < £ < T and m = 0, 1 the average autocorrelation in Equation 3.442 

reduces to 


Rv,(% +mT) 



^ M - 1 

p^ e j2nhn[q(t-\-^-(k-m)T)-q(t-kT)\ 

, n=—{M—\) 

\ n odd 


dt (3.4-43) 
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Let us focus on R v ,(^ +mT) for £ +mT > LT. In this case, Equation 3.4—43 may 
be expressed as 

Rvtf + mT) = m>L, 0 < $ < T (3.4-44) 

where <J>/ (/z) is the characteristic function of the random sequence {/„}, defined as 

4>/(/z) = E[e jnhI "] 

M-\ A 

_ 'y ^ p gjnhn (3.4—45) 

n=-(M- 1) 
n odd 

and /.(£ ) is the remaining part of the average autocorrelation function, which may be 
expressed as 


rT 0 


/ 


M - 1 


M£)=-/ p „ew\j2nhn 


k=l—L \n=-(M- 1) 
\ n odd 


- — q(t — kT) 


m+ 1 M-l 

>< n e P n exp[j2nhnq(t + £ — &T)] I dt, 


m > L 


k—m—L \ n=— (M— 1) 
\ h odd 


(3.4-46) 


Thus, R V/ (r) may be separated into a product of A(£) and <f> / ( h ) as indicated in Equa- 
tion 3.4-44 for r = % + mT > LT and 0 < £ < T . This property is used below. 

The Fourier transform of R V[ (r) yields the average power density spectrum as 


«$„,(/) = 


/ OO 

R Vl ( r)e~-’ 2nfz dt 

-OO 


= 2Re 


' POO 

/ R Vl (r)e~ jln ^ T dr 

J o 


(3.4^17) 


But 


pOO pLI 

/ R V[ (r)e~ j27T ^ T dr = / R Vl (r)e~ j27Tfz dr 

Jo Jo 

POO 

+ / R Vl (r)e~ ]l7tfr dr 
Jlt 


(3.4-48) 


With the aid of Equation 3.4-44, the integral in the range LT < x < oo may be 
expressed as 


00 Mm+l)T 

R Vl (r)e~ j27r f T dr = ^2 / R Vl (r)e~ j2n ^ T dr (3.4^19) 
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Now, let r = £ + mT. Then Equation 3 .4 — 49 becomes 


ILT 


R VI ( T)e~ j27rfr dt = J2 + mT)e~ j2,lf ^ +mr) d$ 

m=L J ° 
oo rT 

= J2 *(£)[< Pi(h)]"'- L e- j2 * n S +mT) d% 


(3.4-50) 


n=L ' 


oo r j 

= Y ^\{h )e~ j27rfn T / X^)e- j2nm+LT) d^ 

n = o Jo 


A property of the characteristic function is < 1. For values of h for which 

< 1, the summation in Equation 3.4—50 converges and yields 


<D "(h)e- j2 * fnT 

n = 0 


l 

1 - d>i{h)e~i 2lT f T 


(3.4-51) 


In this case, Equation 3.4-50 reduces to 

I" R Vl (T)e- j2 ” fT dt = \ [ T R Vl G + LT)e~ j27Tm+LT> d$ 

Jlt 1 - <$>i(h)e~J 2n f T Jo 

(3.4-52) 

By combining Equations 3.4-47, 3.4-48, and 3.4-52, we obtain the power density 
spectrum of the CPM signal in the form 


£„,(/) = 2 Re 


r LT _ i r 

/ R v Xx)e~ j2nft dr -\ 

Jo 1 - <$>i0i)e j2nfT Jl 


(. L+l)T _ ■ 

, R v Xx)e~i 2]Z f T dr 
- <&,(h)e-J 2 ”f T J LT 

(3.4-53) ' 


This is the desired result when \<t>/(h)\ < 1. In general, the power density spectrum 
is evaluated numerically from Equation 3.4-53. The average autocorrelation function 
R V/ (x) for the range 0 < r < (L + I ) T may be computed numerically from Equa- 
tion 3.4-43. 

For values of h for which 1 4>/(/i)| = 1, e.g., h = K, where K is an integer, we can 
set 

d>/(/z) = e j27IV , 0 < v < 1 (3.4-54) 


Then the sum in Equation 3.4—50 becomes 


OO 

e -j2nT(f-v/T)n 

n = 0 


1 1 

2 + 2T 


em /-; 



1 

j - cot 7T T 

2 



(3.4-55) 


Thus, the power density spectrum now contains impulses located at frequencies 

fn = n -^, 0 < v < 1, n = 0,1,2,... (3.4-56) 


The result in Equation 3.4-55 can be combined with Equations 3.4-50 and 3 .4 — 48 to 
obtain the entire power density spectrum, which includes both a continuous spectrum 
component and a discrete spectrum component. 
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Let us return to the case for which | < t > /(/?)| < 1. When symbols are equally proba- 
ble, i.e., 


1 

P„ = — for all n (3.4-57) 

the characteristic function simplifies to the form 




1 

M 


M - 1 

V e j7lhn 


n =-(M- 1 ) 
nodd 


(3.4-58) 


1 sin M 7th 
M sin Jth 

Note that in this case <J> / (/z ) is real. The average autocorrelation function given by 
Equation 3.4—43 also simplifies in this case to 


Rv,(r) = 


1 

2 T 


rT h/n 


n 


1 sin2nhM[q(t + r — kT) — q(t — kT )] 
M sin27r/t[g(f + r — kT) — q(t — kT)] 


dt (3.4-59) 


k=\-L 

The corresponding expression for the power density spectrum reduces to 
r LT 

£,;,(/) = 2 ' 


Luo 


r(L+l)T 


+ 


f? l)( (r)cos 2nfr dr 

1 — <t>i(h)cos2iTfT 
1 + - 2cU z (^) cos 2 t xfT J LT 

<h/(/t) sin27r/T r( L +W 


1 + <S>)(h) - 2 0/(/t) cos 2nfT 


’ LT 


R Vi (t)cos2jt ft dr (3.4-60) 
^„,(r)sin 2nfr dr 


Power Spectral Density of CPFSK 

A closed-form expression for the power density spectrum can be obtained from Equa- 
tion 3.4-60 when the pulse shape g(t) is rectangular and zero outside the interval 
[0, T], In this case, q(t) is linear for 0 < t < T . The resulting power spectrum may be 
expressed as 


M 


M M 


Sv(f) = T 


where 


1 1V1 2 

M £ E E B nm (f)An(f)A m (f ) 


n = 1 


1=1 m= 1 


A„(f) = 

Bnmif) = 


sin 7t[fT - 2(2 n - 1 - M)h] 
n\fT -\(2n- 1 - M)h] 
cos(27r fT — a nm ) — 4> cos a mn 


1 _)_ <j )2 _ 2d> cos2itfT 
a„ m = Tth{m + n — 1 — M) 
sin Mnh 


M sin nil 


(3.4-61) 


(3.4-62) 


<S> = = 
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The power density spectrum of CPFSK for M = 2,4, and 8 is plotted in Fig- 
ures 3.4-1 to 3.4-3 as a function of the normalized frequency fT, with the modulation 
index h =2 fyT as a parameter. Note that only one-half of the bandwidth occupancy 
is shown in these graphs. The origin corresponds to the carrier f c . The graphs illustrate 
that the spectrum of CPFSK is relatively smooth and well confined for h < 1. As h 
approaches unity, the spectra become very peaked, and for h = 1 when |4>| = 1, we 
find that impulses occur at M frequencies. When h > 1, the spectrum becomes much 



0 12 3 

Normalized frequency fT 
(a) 


Spectral density for two-level CPFSK 




0.0 0.4 0.8 1.2 1.6 

Normalized frequency fT 
(c) 


Spectral density for two-level CPFSK 



(d) 


FIGURE 3.4-1 

Power spectral density of binary CPFSK. 
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Spectral density for four-level CPFSK 



Spectral density for four-level CPFSK 




i i i — ■ 

0 12 3 


Normalized frequency fT 
(c) 


FIGURE 3.4-2 

Power spectral density of quaternary CPFSK. 


broader. In communication systems where CPFSK is used, the modulation index is 
designed to conserve bandwidth, so that h < 1 . 

The special case of binary CPFSK with h = t (or f d = 1/4 T) and <t> = 0 
corresponds to MSK. In this case, the spectrum of the signal is 


16 A 2 T f cosljtfT 
= it 2 Vl - 16 f 2 T 2 


2 


(3.4-63) 


where the signal amplitude A = 1 in Equation 3.4-62. In contrast, the spectrum 
of four-phase offset (quadrature) PSK (OQPSK) with a rectangular pulse g(t) of 
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Spectral density for eight-level CPFSK 
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Spectral density for eight-level CPFSK 



FIGURE 3.4-3 

Power spectral density of octal CPFSK. 


duration T is 

„ , /sin;r/T\ 2 

S v (f) = A 2 T (3-4-64) 

If we compare these spectral characteristics, we should normalize the frequency 
variable by the bit rate or the bit interval T b . Since MSK is binary FSK, it follows 
that T = T b in Equation 3.4-63. On the other hand, in OQPSK, 7 = 27), so that 
Equation 3.4—64 becomes 


S v (f) = 2A 2 T b 


( sin 2nfT b 
V 2nf T b 


2 


(3.4-65) 


The spectra of the MSK and OQPSK signals are illustrated in Figure 3.4-4. Note 
that the main lobe of MSK is 50 percent wider than that for OQPSK. However, the side 
lobes in MSK fall off considerably faster. For example, if we compare the bandwidth 
W that contains 99 percent of the total power, we find that W = 1.2/7), for MSK 
and W ~ 8/7}, for OQPSK. Consequently, MSK has a narrower spectral occupancy 
when viewed in terms of fractional out-of-band power above fT b = 1. Graphs for 
the fractional out-of-band power for OQPSK and MSK are shown in Figure 3.4—5. 
Note that MSK is significantly more bandwidth-efficient than QPSK. This efficiency 
accounts for the popularity of MSK in many digital communication systems. 

Even greater bandwidth efficiency than MSK can be achieved by reducing the 
modulation index. However, the FSK signals will no longer be orthogonal, and there 
will be an increase in the error probability. 


Spectral Characteristics of CPM 

In general, the bandwidth occupancy of CPM depends on the choice of the modulation 
index h, the pulse shape g(t), and the number of signals M. As we have observed 


Normalized power spectral density (dB) 
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Normalized frequency offset from carrier ( f~f)T b [(Hz/bit)/s] 


FIGURE 3.4-4 

Power spectral density of MSK and OQPSK. [ Source : Gronemeyer and McBride (1976); 
© IEEE.] 



2WT= two-sided normalized bandwidth [(Hz/bit)/s] 


FIGURE 3.4-5 

Fractional out-of-band power (normalized 
two-sided bandwidth = 2 WT). [Source: 
Gronemeyer and McBride (1976); 

© IEEE.] 


for CPFSK, small values of h result in CPM signals with relatively small bandwidth 
occupancy, while large values of h result in signals with large bandwidth occupancy. 
This is also the case for the more general CPM signals. 
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dB 



FIGURE 3.4-6 

Power spectral density for binary CPM with h = \ and 
different pulse shapes. [Source: Aulin et al. (1981); 

© IEEE.] 


The use of smooth pulses such as raised cosine pulses of the form 


git) = 


2 LT (1 


COS 


27 it \ 
LT ) 


0 <t <LT 
otherwise 


(3.4-66) 


where L = 1 for full response and L > 1 for partial response, results in smaller band- 
width occupancy and hence greater bandwidth efficiency than in the use of rectangular 
pulses. For example, Figure 3.4-6 illustrates the power density spectrum for binary CPM 
with different partial-response raised cosine (LRC) pulses when h = \. For comparison, 
the spectrum of binary CPFSK is also shown. Note that as L increases, the pulse g(t) 
becomes smoother and the corresponding spectral occupancy of the signal is reduced. 

The effect of varying the modulation index in a CPM signal is illustrated in Fig- 
ure 3.4-7 for the case of M = 4 and a raised cosine pulse of the form given in 
Equation 3.4—66 with L = 3. Note that these spectral characteristics are similar to the 


dB 



FIGURE 3.4-7 

Power spectral density for M = 4 CPM with 3RC and 
different modulation indices. [Source: Aulin et al. (1981); 
© IEEE.] 
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ones illustrated previously for CPFSK, except that these spectra are narrower due to 
the use of a smoother pulse shape. 

■ 3.5 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

The digital modulation methods introduced in this chapter are widely used in digital 
communication systems. Chapter 4 is concerned with optimum demodulation tech- 
niques for these signals and their performance in an additive white Gaussian noise 
channel. A general reference for signal characterization is the book by Franks (1969). 

Of particular importance in the design of digital communication systems are the 
spectral characteristics of the digitally modulated signals, which are presented in this 
chapter in some depth. Of these modulation techniques, CPM is one of the most impor- 
tant due to its efficient use of bandwidth. For this reason, it has been widely investigated 
by many researchers, and a large number of papers have been published in the techni- 
cal literature. The most comprehensive treatment of CPM, including its performance 
and its spectral characteristics, can be found in the book by Anderson et al. (1986). In 
addition to this text, the tutorial paper by Sundberg (1986) presents the basic concepts 
and an overview of the performance characteristics of various CPM techniques. 

The linear representation of CPM was developed by Laurent (1986) for binary 
modulation. It was extended to M - ary CPM signals by Mengali and Morelli (1995). 
Rimoldi (1988) showed that a CPM system can be decomposed into a continuous-phase 
and a memory less modulator. This paper also contains over 100 references to published 
papers on this topic. 

There are a large number of references dealing with the spectral characteristics of 
CPFSK and CPM. As a point of reference, we should mention that MSK was invented 
by Doelz and Heald in 1961. The early work on the power spectral density of CPFSK 
and CPM was done by Bennett and Rice (1963), Anderson and Salz ( 1965), and Bennett 
and Davey (1965). The book by Lucky et al. (1968) also contains a treatment of the 
spectral characteristics of CPFSK. Most of the recent work is referenced in the paper 
by Sundberg (1986). We should also cite the special issue on bandwidth-efficient mod- 
ulation and coding published by the IEEE Transactions on Communications (March 
1981), which contains several papers on the spectral characteristics and performance of 
CPM. The generalization of MSK to multiple amplitudes was investigated by Weber et 
al. (1978). The combination of multiple amplitudes with general CPM was proposed by 
Mulligan (1988) who investigated its spectral characteristics and its error probability 
performance in Gaussian noise with and without coding. 


PROBLEMS 

3.1 Using the identity 

n(n + 1)(2 n + 1) 



6 
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show that 


l 2 + 3 2 + 5 2 4 + {M — l) 2 = 


M(M 2 

6 


1) 


and derive Equation 3.2-5. 

3.2 Determine the signal space representation of the four signals k = 1, 2, 3, 4, shown 
in Figure P3.2, by using as basis functions the orthonormal functions <pi(t) and fait). Plot 
the signal space diagram, and show that this signal set is equivalent to that for a four-phase 
PSK signal. 


*i(f) 


■s 4 (0 


0 

-<E 


H (0 


t 0 


1 — 


t 0 


S3W 
<1 - 


1 - 


FIGURE P3.2 


3.3 jr/4-QPSK may be considered as two QPSK systems offset by jr/4 rad. 

1. Sketch the signal space diagram for a jr/4-QPSK signal. 

2. Using Gray encoding, label the signal points with the corresponding data bits. 

3.4 Consider the octal signal point constellations in Figure P3.4. 

1. The nearest-neighbor signal points in the 8-QAM signal constellation are separated 
in distance by A units. Determine the radii a and b of the inner and outer circles, 
respectively. 

2. The adjacent signal points in the 8-PSK are separated by a distance of A units. Determine 
the radius r of the circle. 




FIGURE P3.4 


8-PSK 


8-QAM 
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3. Determine the average transmitter powers for the two signal constellations, and compare 
the two powers. What is the relative power advantage of one constellation over the other? 
(Assume that all signal points are equally probable.) 

3.5 Consider the 8-point QAM signal constellation shown in Figure P3.4. 

1. Is it possible to assign 3 data bits to each point of the signal constellation such that the 
nearest (adjacent) points differ in only 1 bit position? 

2. Determine the symbol rate if the desired bit rate is 90 Mbits/s. 

3.6 Consider the two 8-point QAM signal constellations shown in Figure P3.6. The minimum 
distance between adjacent points is 2 A. Determine the average transmitted power for each 
constellation, assuming that the signal points are equally probable. Which constellation is 
more power-efficient? 

. FIGURE P3.6 

• • • • 


• • • • 

(a) (b) 


3.7 Specify a Gray code for the 16-QAM signal constellation shown in Figure P3.7. 


7 T 

5 " 

3 " 

In 

-7 -5 -3 — 1 _ 1 , , 13 5 7 

- 3 " 

- 5 " 

- 7 * 


FIGURE P3.7 


3.8 In an MSK signal, the initial state for the phase is either 0 or jr rad. Determine the terminal 
phase state for the following four input pairs of input data: 

1. 00 
2 . 01 

3. 10 

4. 11 

3.9 Determine the number of states in the state trellis diagram for 

1. A full-response binary CPFSK with h = | or 

2. A partial-response L = 3 binary CPFSK with A = | or | . 
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3.10 


A speech signal is sampled at a rate of 8 kHz, and then encoded using 8 bits per sample. 
The resulting binary data are then transmitted through an AWGN baseband channel via 
M-level PAM. Determine the bandwidth required for transmission when 


1. AT = 4 

2. M = 8 

3. M = 16 


3.11 The power density spectrum of the cyclostationary process 

OO 

v(t) = J2 ^S(t ~ nT) 

n=— oo 

can be derived by averaging the autocorrelation function R v (t + r, t) over the period T 
of the process and then evaluating the Fourier transform of the average autocorrelation 
function. An alternative approach is to change the cyclostationary process into a stationary 
process (t) by adding a random variable A, uniformly distributed over 0 < A < T, so 
that 

OO 

Va(0= ^2 hg{t-nT- A) 

n=— oo 

and defining the spectral density of v(t) as the Fourier transform of the autocorrelation 
function of the stationary process v&{t )• Derive the result in Equation 3.4-16 by evaluating 
the autocorrelation function of u A (f) and its Fourier transform. 

3.12 Show that 16-QAM can be represented as a superposition of two four-phase constant- 
envelope signals where each component is amplified separately before summing, i.e., 

s(t) = G(A„ cos 2tt f c t + B„ sin 2nf c t) + (C„ cos 2nf c t + D„ sin 2nf c t) 

where {A,,}, { } , {C„}, and [D n ] are statistically independent binary sequences with 
elements from the set {+1, —1} and G is the amplifier gain. Thus, show that the resulting 
signal is equivalent to 

s(t) = I„ cos 2 it f c t + Q„ sin27r/ c f 
and determine /„ and Q„ in terms of A n , B„ , C„, and D„. 


3.13 Consider a four-phase PSK signal represented by the equivalent lowpass signal 

u(t) = yjngji-nT) 

n 


where /„ takes on one of the four possible values y ^(±1 ± j ) with equal probability. The 
sequence of information symbols {/„} is statistically independent. 

1. Determine and sketch the power density spectrum of u(t) when 

_ (A 0 < t < T 

[0 otherwise 


2. Repeat Part 1 when 


g(t) 


( A sin(jrt/T) 

\o 


0 <t <T 
otherwise 
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3. Compare the spectra obtained in Parts 1 and 2 in terms of the 3-dB bandwidth and the 
bandwidth to the first spectral zero. 

3.14 A PAM partial-response signal (PRS) is generated as shown in Figure P3.14 by exciting 
an ideal lowpass filter of bandwidth W by the sequence 

Bn = 4 + 4-1 

at a rate 1 / T = 2 W symbols/s. The sequence {/„} consists of binary digits selected 
independently from the alphabet {1, —1} with equal probability. Hence, the filtered signal 
has the form 

oo 1 

v(t) = Y B„g(t -nT), T = — 



Output 


FIGURE P3.14 

1. Sketch the signal space diagram for n(f), and determine the probability of occurrence 
of each symbol. 

2. Determine the autocorrelation and power density spectrum of the three-level sequence 

{B n }- 

3. The signal points of the sequence { B„ } form a Markov chain. Sketch this Markov chain, 
and indicate the transition probabilities among the states. 

3.15 The lowpass equivalent representation of a PAM signal is 

u{t ) = ~Y 4,? if - nT ) 

n 

Suppose g(t) is a rectangular pulse and 

In — n n —2 

where {a n } is a sequence of uncorrelated binary- valued (1 , — 1) random variables that occur 
with equal probability. 

1. Determine the autocorrelation function of the sequence {/„}. 

2. Determine the power density spectrum of u{t). 

3. Repeat (2) if the possible values of the a„ are (0, 1). 
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3.16 Use the results in Section 3.4-4 to determine the power density spectrum of the binary 
FSK signals in which the waveforms are 

s,(f) = sin®,-?, i = 1,2, 0 < t < T 

where u>\ = rnt /T and a >2 = nur/T, n ^ m, and m and n are arbitrary positive integers. 
Assume that p\ = p 2 = | . Sketch the spectrum, and compare this result with the spectrum 
of the MSK signal. 

3.17 Use the results in Section 3.4-4 to determine the power density spectrum of multitone FSK 
(MFSK) signals for which the signal waveforms are 

2jt nt 

s„ (t ) = sin — — — , n = 1, 2, . . . , M, 0 < t < T 

Assume that the probabilities p„ = 1 / M for all n. Sketch the power spectral density. 

3.18 A quadrature partial-response signal (QPRS) is generated by two separate partial-response 
signals of the type described in Problem 3. 14 placed in phase quadrature. Hence, the QPRS 
is represented as 

s(t) = Re [v{t)e i2ltf ^] 

where 

v{t) = v c (t) + jv s (t) = ^2 B ng(t ~ nT) + j^2 C„g{t - nT) 

n n 

and B n = I„ + 7„_i and C„ = J n + J n -\. The sequences {B n } and {C„ } are independent, 
and /„ = ± 1 , J n = ± 1 with equal probability. 

1. Sketch the signal space diagram for the QPRS signal, and determine the probability of 
occurrence of each symbol. 

2. Determine the autocorrelations and power spectral density of u c (Q, v s (t), and v(f). 

3. Sketch the Markov chain model, and indicate the transition probabilities for the QPRS. 


3.19 The information sequence \a„ ),7ii_ OCl is a sequence of iid random variables, each taking 
values +1 and —1 with equal probability. This sequence is to be transmitted at baseband 
by a biphase coding scheme, described by 

OO 

s(t) = ^2 a ng(t ~ nT) 

n=—o o 

where g(t) is shown in Figure P3.19. 


g(t) 

l 




FIGURE P3.19 


1. Find the power spectral density of s(t). 
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2. Assume that it is desirable to have a zero in the power spectrum at / = 1/ T . To this 
end, we use a precoding scheme by introducing b n = a„ + ka„-i, where k is some 
constant, and then transmit the {£>„} sequence using the same g(t). Is it possible to 
choose k to produce a frequency null at / = l/Tl If yes, what are the appropriate 
values and the resulting power spectrum? 

3. Now assume we want to have zeros at all multiples of /o = 1 /AT . Is it possible to have 
these zeros with an appropriate choice of k in the previous part? If not, then what kind 
of precoding do you suggest to achieve the desired result? 


3.20 The two signal waveforms for binary FSK signal transmission with discontinuous phase 
are 


2 E b 

so(t) = \/ — cos 


2 E b 

st(0 = \/ — cos 


2?r ( f c -^y )t + t 
2?r ( f c A — — ) t + 


0 < t < T 


0 <t <T 


where A / = \/T <3C f c , and Oq and 6\ are independent uniformly distributed random 
variables on the interval (0, 2 jt). The signals so(t ) and si(r) are equally probable. 

1. Determine the power spectral density of the FSK signal. 

2. Show that the power spectral density decays as 1 // 1 2 3 for / f c . 


3.21 The elements of the sequence {/,i},T~oo are independent binary random variables taking 
values of ± 1 with equal probability. This data sequence is used to modulate the basic pulse 
u{t) shown in Figure P3. 21(a). The modulated signal is 


+oo 

X(t)= ^2 InU(t-nT) 

n=—o o 


«( f ) 


FIGURE P3.21(a) 


A 


0 


T 


t 


1. Find the power spectral density of X(t). 

2. If u\(t), shown in Figure P3.21(b), were used instead of u(t), how would the power 
spectrum in part 1 change? 

3. In part 2, assume we want to have a null in the spectrum at / = This is done by a 
precoding of the form b n = I„ + a I„-i- Find the value of a that provides the desired 

null. 
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FIGURE P3.21(b) 










0 

2 T 


4. Is it possible to employ a precoding of the form b n = /„ + a,- for some finite 

N such that the final power spectrum will be identical to zero for ^ < |/| < ^? If 
yes, how? If no, why? (Hint: Use properties of analytic functions.) 


3.22 A digital signaling scheme is defined as 


OO 

X(t) = 'y' [a n u(t — nT)cos(2jrf c t) — b n u(t — nT)sm(2jrf c t)] 

n=—oo 

where u(t) = A(t/2T), 


A(0 


t + 1 -1 < t < 0 

-t + 1 0 < t < 1 

0 otherwise 


and each (a n , b n ) pair is independent from the others and is equally likely to take any of 
the three values (0, 1), (-v/3/2, —1/2), and (— s/3/2, —1/2). 

1. Determine the lowpass equivalent of the modulated signal. Determine the in-phase and 
quadrature components. 

2. Determine the power spectral density of the lowpass equivalent signal; from this deter- 
mine the power spectral density of the modulated signal. 

3. By employing a precoding scheme of the form 

f c n = a n + aa n -i 
= b n +a£„_i 

where a is in general a complex number, and transmitting the signal 

OO 

7(f) = [c n u(t — nr)cos(27r/ c f) — d n u(t — nT)sm(2jrf c t)] 

n = — oo 

we want to have a lowpass signal that has no dc component. Is it possible to achieve 
this goal by an appropriate choice of a? If yes, find this value. 


3.23 A binary memoryless source generates the equiprobable outputs [ak}'^ = _ 00 which take 
values in {0, 1}. The source is modulated by mapping each sequence of length 3 of the 
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source outputs into one of the eight possible {a,- , 6, }? =1 pairs and generating the modulated 
sequence 

OO 

s(t) = E a„g(t — nT)cos(2irfot + 8 n ) 

n =— oo 

where 

{ 2 t/T 0 < f < 772 

2 -It /T T/2<t<T 

0 otherwise 

1. Find the power spectral density of iff) in terms of a 2 = la;! 2 and/3 = Ym = 1 a i^ S> ■ 

2. For the special case of c^dd = a, a even = b, and 0, = ( i — l);r/4, determine the power 
spectral density of s(t). 

3. Show that for a = b, case 2 reduces to a standard 8-PSK signaling scheme, and 
determine the power spectrum in this case. 

4. If a precoding of the form b„ = a n 0 a„_ 1 (where 0 denotes the binary addition) were 
applied to the source outputs prior to modulation, how would the results in parts 1, 2, 
and 3 change? 

3.24 An information source generates the ternary sequence oa . Each /„ can take one of 

the three possible values 2, 0, and —2 with probabilities 1/4, 1/2, and 1/4, respectively. 
The source outputs are assumed to be independent. The source outputs are used to generate 
the lowpass signal 

OO 

v(t)= J 2 Ing(t-nT) 

n =— 00 

1. Determine the power spectral density of the process n(r), assuming g(t) is the signal 
shown in Figure P3.24. 

2. Determine the power spectral density of 

OO 

w(t)= ^2 Jng(t-nT) 

n =— 00 

where J„ = 7„_i + 7„ + 7„+i. 


g(0 



FIGURE P3.24 
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3.25 The information sequence {a,,} is an iid sequence taking the values —1, 2, and 0 with 
probabilities 1/4, 1 /4, and 1 /2. This information sequence is used to generate the baseband 
signal 


OO 

v(t) = E a n sine 

n=—oo 

1. Determine the power spectral density of v(t). 

2. Define the sequence [b n ] as b n = a n + a, 7 _i — a n - 2 and generate the baseband signal 

00 

u(t) = b n sine 

n=— 00 

Determine the power spectral density of u(t). What are the possible values for the b„ 
sequence? 

3. Now let us assume w(t) is defined as 

OO 

w(t) = E c n sine 

n=— 00 

where c n = a n + ja n ^\. Determine the power spectral density of w(t). 

(Hint: You can use the relation Ylm =- 00 r =TE,“-oo «(/ - m/r).) 

3.26 Let {AkI/L-oo denote an information sequence of independent random variables, taking 
values of ±1 with equal probability. A QPSK signal is generated by modulating a rectan- 
gular pulse shape of duration 2 T by even and odd indexed a n ’s to obtain the in-phase and 
quadrature components of the modulated signal. In other words, we have 

_ f 1 0 < t < 2T 

^ 1T \() otherwise 

and we generate the in-phase and quadrature components according to 

OO 

Xi(t)= Y] a 2n g 2T (t - 2nT) 

n=— oo 

OO 

x q (t) = ^2 a 2n +ig 1T (t — 2nT) 

n=— oo 





Then xi(t ) = x ,(r) + jx q (t ) and x (f) = Re [j c/(f)e-' 2ir ^° f ] . 

1. Determine the power spectral density of x/(f). 

2. Now let x q {t) = ZZ-oc a 2 n+ig rr [f — (2n + 1)T]; in other words, let the quadrature 
component stagger the in-phase component by T . This results in an OQPSK system. 
Determine the power spectral density of xi(t) in this case. How does this compare with 
the result of part 1? 

3. If in part 2 instead of g 2T (t) we employ the following sinusoidal signal 

f sin ( fT) 0 < t < 2T 
*' <,)= 0 otherw.se 
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the resulting modulated signal will be an MSK signal. Determine the power spectral 
density of x\(t) in this case. 

4. Show that in the case of MSK signaling, although the basic pulse gi(t) does not have 
a constant amplitude, the overall signal has a constant envelope. 

3.27 {a n is a sequence of iid random variables each taking 0 or 1 with equal probability. 

1. The sequence b„ is defined as b„ = a„_i 0 a„ where 0 denotes binary addition 
(EXCLUSIVE-OR). Determine the autocorrelation function for the sequence b n and 
the power spectral density of the PAM signal 

OO 

v(t) = ^2 b n g(t-nT) 

n=—oo 

where 

_ J 1 0 <t <T 

^ ^ [0 otherwise 

2. Compare the result in part 1 with the result when b n = a n -i + a„. 

3.28 Consider the signal constellation shown in Figure P3.28. 

FIGURE P3.28 



The lowpass equivalent of the transmitted signal is represented as 

OO 

s l(t) = ^2 a n g(t - nT ) 
n =— oo 

where g(t ) is a rectangular pulse defined as 

f 1 0 < t < T 

8 \ 0 otherwise 

and the a „ ’s are independent and identically distributed (iid) random variables that can 
assume the points in the constellation with equal probability. 
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1. Determine the power spectral density of the signal si(t). 

2. Determine the power spectral density of the transmitted signal s(t), assuming that the 
carrier frequency is /o (assuming /o ~S> ^). 

3. Determine and plot the power spectral density of s/(f) for the case when ri = r 2 (plot 
the PSD as a function of J T). 

3.29 Determine the autocorrelation functions for the MSK and offset QPSK modulated signals 
based on the assumption that the information sequences for each of the two signals are 
uncorrelated and zero-mean. 

3.30 Sketch the phase tree, the state trellis, and the state diagram for partial-response CPM with 
h = \ and 

_ j 1/4T 0<t<2T 

^ ^ [0 otherwise 


3.31 Determine the number of terminal phase states in the state trellis diagram for 

1. A full-response binary CPFSK with /; = | or 

2. A partial-response L = 3 binary CPFSK with h = I or | . 


3.32 In the linear representation of CPM, show that the time durations of the 2 L 1 pulses {c,t(f )} 
are as follows: 


c 2 (t) 

c A (t) = c 5 (t) = c 6 (t) 


Co(t ) 

ci(0 

C ?(0 

ci(t) 


0, 

0, 

0 , 

0 , 


t < 0 and t > (L + l)T 
t < 0 and t > (L — I )T 
t < 0 and t > (L — 2 )T 
t < 0 and t > (L — 3 )T 


C2i-2(t) = • • • = C2i-i(0 = 0, 


t < 0 and t > T 


3.33 Use the result in Equation 3.4-31 to derive the expression for the power density spectrum 
of memoryless linear modulation given by Equation 3.4-16 under the condition that 

s k (t) = I k s(t), k = 1, 2, . . . , K 

where I k is one of the K possible transmitted symbols that occur with equal probability. 

3.34 Show that a sufficient condition for the absence of the line spectrum component in Equa- 
tion 3.4-31 is 

K 

= o 

i = 1 

Is this condition necessary? Justify your answer. 



Optimum Receivers for AWGN Channels 


In Chapter 3, we described various types of modulation methods that may be used to 
transmit digital information through a communication channel. As we have observed, 
the modulator at the transmitter performs the function of mapping the information 
sequence into signal waveforms. These waveforms are transmitted over the channel, 
and a corrupted version of them is received at the receiver. 

In Chapter 1 we have seen that communication channels can suffer from a variety 
of impairments that contribute to errors. These impairments include noise, attenuation, 
distortion, fading, and interference. Characteristics of a communication channel deter- 
mine which impairments apply to that particular channel and which are the determining 
factors in the performance of the channel. Noise is present in all communication chan- 
nels and is the major impairment in many communication systems. In this chapter we 
study the effect of noise on the reliability of the modulation systems studied in Chap- 
ter 3. In particular, this chapter deals with the design and performance characteristics 
of optimum receivers for the various modulation methods when the channel corrupts 
the transmitted signal by the addition of white Gaussian noise. 


■ 4.1 

WAVEFORM AND VECTOR CHANNEL MODELS 

The additive white Gaussian noise (AWGN) channel model is a channel whose sole 
effect is addition of a white Gaussian noise process to the transmitted signal. This 
channel is mathematically described by the relation 

r{t) = s m {t) + n(t) (4.1-1) 

where s m (t) is the transmitted signal which, as we have seen in Chapter 3 is one of 
M possible signals; n(t) is a sample waveform of a zero-mean white Gaussian noise 
process with power spectral density of /V (l /2; and r(t) is the received waveform. This 
channel model is shown in Figure 4.1-1. 
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Channel 


FIGURE 4.1-1 


Transmitted 


0 


Received Model for received signal passed through 


signal 

s m (t) 


r(f) = sjt) + n(t) an AWGN channel. 


Noise 

n(t) 


The receiver observes the received signal r (t ) and, based on this observation, makes 
the optimal decision about which message m , 1 < m < M, was transmitted. By an 
optimal decision we mean a decision rule which results in minimum error probabil- 
ity, i.e., the decision rule that minimizes the probability of disagreement between the 
transmitted message m and the detected message m given by 


Although the AWGN channel model seems very limiting, its study is beneficial 
from two points of view. First, noise is the major type of corruption introduced by many 
channels. Therefore isolating it from other channel impairments and studying its effect 
results in better understanding of its effect on all communication systems. Second, 
the AWGN channel, although very simple, is a good model for studying deep space 
communication channels which were historically one of the first challenges encountered 
by communication engineers. 

We have seen in Chapter 3 that by using an orthonormal basis {(f>j(t), 1 < / < N], 
each signal s m (t) can be represented by a vector s m e M. N . It was also shown in 
Example 2.8-1 that any orthonormal basis can be used for expansion of a zero-mean 
white Gaussian process, and the resulting coefficients of expansion will be iid zero- 
mean Gaussian random variables with variance Nq/2. Therefore, [<pj{t), 1 < j < N], 
when extended appropriately, can be used for expansion of the noise process n(t). This 
observation prompts us to view the waveform channel r{t) = s m (t ) + n(t) in the vector 
fonn r = s m + n where all vectors are IV -dimensional and components of n are iid 
zero-mean Gaussian random variables with variance Nq/2. We will give a rigorous 
proof of this equivalence in Section 4.2. We continue our analysis with the study of the 
vector channel introduced above. 

4.1-1 Optimal Detection for a General Vector Channel 

The mathematical model for the AWGN vector channel is given by 


where all vectors are A -dimensional real vectors. The message m is chosen according 
to probabilities P m from the set of possible messages { 1 , 2, . . . , M}. The noise compo- 
nents n j , 1 < j < N, are iid, zero-mean, Gaussian random variables each distributed 
according to J\f( 0, Nq/2). Therefore, the PDF of the noise vector n is given by 


P c = P [m 7 ^ m ] 


(4.1-2) 


r = s,„ + n 


(4.1-3) 



pin ) = 


N 


(4.1-4) 
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s m 

Channel 

r 


P(r\s m ) 



FIGURE 4.1-2 

A general vector channel. 


We, however, study a more general vector channel model in this section which is 
not limited to the AWGN channel model. This model will later be specialized to an 
AWGN channel model in Section 4.2. In our model, vectors s m are selected from a set 
of possible signal vectors [s m , 1 < m < M } according to prior or a priori probabilities 
P m and transmitted over the channel. The received vector r depends statistically on the 
transmitted vector through the conditional probability density functions p(r\s m ). The 
channel model is shown in Figure 4.1-2. 

The receiver observes r and based on this observation decides which message was 
transmitted. Let us denote the decision function employed at the receiver by g(r), which 
is a function from M> N into the set of messages {1,2, . . . , M}. Now if g(r) = m, i.e., 
the receiver decides that m was transmitted, then the probability that this decision is 
correct is the probability that m was in fact the transmitted message. In other words, 
the probability of a correct decision, given that r is received, is given by 

P [correct decision |r ] = P [m sent |r ] (4.1-5) 

and therefore the probability of a correct decision is 

P [correct decision] = J P [correct decision \r]p(r) dr 

(4.1-6) 

= J P[m sent \r]p(r) dr 

Our goal is to design an optimal detector that minimizes the error probability or, 
equivalently, maximizes P [correct decision]. Since p(r) is nonnegative for all r. the 
right-hand side of Equation 4.1-6 is maximized if for each r the quantity P [m \r ] is 
maximized. This means that the optimal detection rule is the one that upon observing 
r decides in favor of the message m that maximizes P [m \r ]. In other words, 

m = g opl (r) = arg max P [m \ r ] (4. 1-7) 

1 <m<M 

The optimal detection scheme described in Equation 4.1-7 simply looks among all 
P [m |r ] for 1 < m < M and selects the m that maximizes P [m \r ]. The detector then 
declares this maximizing m as its best decision. Note that since transmitting message 
m is equivalent to transmitting s m , the optimal decision rule can be written as 

m = gopt(r) = arg max P [s m \r ] (4. 1-8) 

1 <m<M 


MAP and ML Receivers 

The optimal decision rule given by Equations 4.1-7 and 4.1-8 is known as the max- 
imum a posteriori probability rule, or MAP rule. Note that the MAP receiver can be 
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simplified to 

zv P mP(v\Sm) 

in = arg max 

1 <m<M p{r ) 


(4.1-9) 


and since p{r) is independent of m and for all m remains the same, this is equivalent to 


in = arg max P ll ,p(r\s m ) (4.1-10) 

1 <m<M 

Equation 4.1-10 is easier to use than Equation 4.1-7 since it is given in terms of the 
prior probabilities P m and the probabilistic description of the channel p(r\s m ), both 
directly known. 

In the case where the messages are equiprobable a priori, i.e., when P m = ^ for 
all 1 < m < M, the optimal detection rule reduces to 


m = arg max p(r\s m ) 

1 <m<M 


(4.1-11) 


The term p(r\s m ) is called the likelihood of message m, and the receiver given by 
Equation 4.1-1 1 is called the maximum-likelihood receiver, or ML receiver. It is im- 
portant to note that the ML detector is not an optimal detector unless the messages are 
equiprobable. The ML detector, however, is a very popular detector since in many cases 
having exact information about message probabilities is difficult. 


The Decision Regions 

Any detector — including MAP and ML detectors — partitions the output space M> N into 
M regions denoted by D\ , Di, . . . , Dm such that if r e D m , then m = g(r) = m, i.e., 
the detector makes a decision in favor of m. The region D m , 1 < m < M, is called 
the decision region for message m; and D m is the set of all outputs of the channel that 
are mapped into message m by the detector. If a MAP detector is employed, then the 
D m ’s constitute the optimal decision regions resulting in the minimum possible error 
probability. Lor a MAP detector we have 

D m = {r e : P [m |r ] > P [m \r ], for all 1 < m < M and m ^ m } (4.1-12) 

Note that if for some given r two or more messages achieve the maximum a posteriori 
probability, we can arbitrarily assign r to one of the corresponding decision regions. 

The Error Probability 

To determine the error probability of a detection scheme, we note that when s m is trans- 
mitted, an error occurs when the received r is not in D m . The symbol error probability 
of a receiver with decision regions { D m , 1 < m < M] is therefore given by 

M 

p e = Yl P "' P l r ^ I S m sent] 

m= 1 
M 

— ^ ^ Pm | m 
m = 1 


(4.1-13) 
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where P e \ m denotes the error probability when message m is transmitted and is given by 


Pe\m = / p(r\s m )dr 
J D c m 


= E 


p(r\s m )dr 


\<m'<M J D ’ n ' 

m'+m 


Using Equation 4.1-14 in Equation 4.1-13 gives 


M 


Pe = J2 P >» E 


p(r\s m )dr 


772=1 \<m'<M *" 

m'^tn 


(4.1-14) 


(4.1-15) 


Equation 4.1-15 gives the probability that an error occurs in transmission of a symbol or 
a message and is called symbol error probability or message error probability. Another 
type of error probability is the bit error probability . This error probability is denoted by 
Pb and is the error probability in transmission of a single bit. Determining the bit error 
probability in general requires detailed knowledge of how different bit sequences are 
mapped to signal points. Therefore, in general bnding the bit error probability is not easy 
unless the constellation exhibits certain symmetry properties to make the derivation of 
the bit etTor probability easy. We will see later in this chapter that orthogonal signaling 
exhibits the required symmetry for calculation of the bit error probability. In other cases 
we can bound the bit error probability by noting that a symbol error occurs when at 
least one bit is in error, and the event of a symbol error is the union of the events of the 
errors in the k = log 2 M bits representing that symbol. Therefore we can write 

Pb <Pe< kP h (4.1-16) 


or 


Pe 

log 2 M 


< Pb < Pe 


(4.1-17) 


example 4.1-1. Consider two equiprobable message signals si = (0, 0) and s 2 = 
(1, 1). The channel adds iid noise components n\ and « 2 to the transmitted vector each 
with an exponential PDF of the form 


p{n) = 


n > 0 
n < 0 


Since the messages are equiprobable, the MAP detector is equivalent to the ML 
detector, and the decision region D\ is given by 

Di = {r e R 2 : p(r|si) > p(r|s 2 )} 

Noting that p(r|s = (.v 1 , s 2 )) = pin — r — s), we have 

Di = {r e K 2 : p n (r u r 2 ) > p„{r\ - 1 ,r 2 - 1)} 
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where 


Pn(ni,n 2 ) 


(e~ n I-” 2 n u n 2 > 0 

\0 otherwise 


From this relation we conclude that if either r\ or r 2 is less than 1, then the point r 
belongs to D\, and if both r\ and r 2 are greater than 1, we have e~ n ~ n < g-m-O-fa-i) 
and r belongs to D 2 . 

Note that in this channel neither r\ nor r 2 can be negative, because signal and noise 
are always nonnegative. Therefore, 

D 2 = [r e R 2 : r\ > l,r 2 > 1} 

and 

Di = {r el 2 : r x , r 2 >0, either 0 < r x < 1 or 0 < r 2 < 1 } 

The decision regions are shown in Figure 4.1-3. For this channel, when s 2 is transmitted, 
regardless of the value of noise components, r will always be in D 2 and no error will 
occur. 

Errors will occur only when Si = (0, 0) is transmitted and the received vector 
r belongs to D 2 , i.e., when both noise components exceed 1. Therefore, the error 
probability is given by 

P e = ^ P [r e D 2 |si = (0, 0) sent] 

2 POO POO 

= -y e~ n ' dn\ J e - " 2 dn 2 

= ie~ 2 0.0068 


Sufficient Statistics 

Let us assume that at the receiver we have access to a vector r that can be written in 
terms of two vectors r \ and r 2 , i.e., r = (r i, r 2 ). We further assume that s m , r \ , and 
r 2 constitute a Markov chain in the given order, i.e., 


p(r u r 2 \s,„) = p(r\\s m )p(r 2 \ri) 


(4.1-18) 
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Under these assumptions r 2 can be ignored in the detection of s m , and the detection 
can be based only on r \ . The reason is that by Equation 4.1-10 


m = arg max P m p{r\s m ) 

1 <m<M 

= arg max P m p(r 1 , r 2 \s m ) 

1 <m<M 

= arg max P in p(r i\s m )p(r 2 \r 1 ) 

1 <m<M 

= arg max P m p(r t \s m ) 

1 <m<M 


(4.1-19) 


where in the last step we have ignored the positive factor p(j 2 \r\) since it does not 
depend on m . This shows that the optimal detection can be based only on r \ . 

When the Markov chain relation among s m , r 1 , and ri as given in Equa- 
tion 4.1-18 is satisfied, it is said that /q is a sufficient statistic for detection of s m . 
In such a case, when ri can be ignored without sacrificing the optimality of the re- 
ceiver, r 2 is called irrelevant data or irrelevant information. Recognizing sufficient 
statistics helps to reduce the complexity of the detection process through ignoring a 
usually large amount of irrelevant data at the receiver. 

example 4 . 1 - 2 . Let us assume that in Example 4.1-1, in addition to r, the receiver 
can observe n \ as well. Therefore, we can assume that r — (ri, rf) is available at the 
receiver, where r\ = (r [ ,ri\) and r-i = r 2 . To design the optimal detector, we notice 
that having access to both r\ and n \ uniquely determines s m 1 at the receiver; and since 
$n = 0 and .V 21 = 1, this uniquely determines the message m , thus making ri = n 
irrelevant. The optimal decision rule in this case becomes 


J 1 if h — = 0 

(2 if r\ — n\ = 1 


(4.1-20) 


and the resulting error probability is zero. 


Preprocessing at the Receiver 

Let us assume that the receiver applies an invertible operation G(-) on the received 
vector r. In other words instead of supplying r to the detector, the receiver passes r 
through G and supplies the detector with p = G(r), as shown in Figure 4.1-4. 

Since G is invertible and the detector has access to p, it can apply G 1 to p to obtain 
G '(/o) = G 1 ( G f r ) ) = r. The detector now has access to both p and r : therefore the 



FIGURE 4.1-4 

Preprocessing at the receiver. 
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optimal detection rule is 


m = arg max P m p(r , p\s m ) 


1 <m<M 


= arg max P m p(r\s m )p(p\r ) 


(4.1-21) 


1 <m<M 


= arg max P m p(r\s m ) 


1 <m<M 


where we have used the fact that p is a function of r and hence, when r is given, p 
does not depend on s m . From Equation 4. 1-21 it is clear that the optimal detector based 
on the observation of p makes the same decision as the optimal detector based on the 
observation of r. In other words, an invertible preprocessing of the received information 
does not change the optimality of the receiver. 

example 4 . 1 - 3 . Let us assume the received vector is of the form 


where n is a nonwhite (colored) noise. Let us further assume that there exists an 
invertible whitening operator denoted by matrix W such that v = Wn is a white 
vector. Then we can consider 


which is equivalent to a channel with white noise for detection without degrading the 
performance. The linear operation denoted by W is called a whitening filter . 


where s m (t) is one of the possible M signals [si(t), s' 2 (t), . . . , s M (t)}, each selected 
with prior probability P m and n(t) is a zero-mean white Gaussian process with power 
spectral density Let us assume that using the Gram-Schmidt procedure, we have 
derived an orthonormal basis {(pj(t), 1 < j < A} for representation of the signals and, 
using this set, the vector representation of the signals is given by {.v„, , 1 < m < M). 
The noise process cannot be completely expanded in terms of the basis [4>j{t)}^ =x . 
We decompose the noise process n(t) into two components. One component, denoted 
by n\(t) is part of the noise process that can be expanded in terms of {<pj(t)}^ =1 , i.e., 
the projection of the noise onto the space spanned by these basis functions; and the 
other part, denoted by ri 2 (t), is the part that cannot be expressed in terms of this basis 
function. With this definition we have 


r = s m + n 


p = Wr = Ws m + v 


■ 4.2 

WAVEFORM AND VECTOR AWGN CHANNELS 


The waveform AWGN channel is described by the input-output relation 

r(t) = s m (t ) + nit) 


(4.2-1) 


N 




(4.2-2) 
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and 


n 2 (t) = nit) - m(t) 


Noting that 


N 


s m (t ) = X] where s mj = (, s m (t ), 0 7 (f)> 

;=i 

and using Equations 4.2-2 and 4.2-3, we can write Equation 4.2-1 as 

N 


(4.2-3) 


(4.2-4) 


r(t) = + n j)4>j{t) + n 2 (t) 

j = i 


By defining 


(4.2-5) 


(4.2-6) 


where 

r j = { Sm(t ), </>;(0) + («(0. 0;(O> = (s m (t) + tl(t ), 0 7 -(O) = (f(0, 0;-(O) (4.2-7) 

we have 

N 

r(t ) = rj<pj(t) + ra 2 (0. where r ; = (r(0, </> 7 (0) (4.2-8) 

;=i 

From Example 2.8-1 we know that n 7 ’s are iid zero-mean Gaussian random variables 
each with variance This result can also be directly shown, by noting that the nf s 
defined by 


tij = / n(t)<pj(t)dt 
J —OO 


(4.2-9) 


are linear combinations of the Gaussian random process n(t), and therefore they are 
Gaussian. Their mean is given by 


E [tij] = E 


n(t)<j>j(t) dt 


E [«(!)] dt 


(4.2-10) 


= 0 


where the last equality holds since n(t ) is zero-mean, i.e., E [n(t)\ = 0. 
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We can also find the covariance of n, and nj as 


CO Vlmnj] 


E [/ 2 ,-n ,] — E [»,]E [n :,] 


— OO 

oo poo 


n(t)(/)j(t) dt / n(s)<pj(s)ds 

J — OO 

E [n(t)n(s)] (j>i{t)(pj{s) dt ds 


J — oo J — OO 

No r 

2 L 


<5(f — s)<pi(t) dt 


4>j(s) ds 


AT COO 

IT - / (t>i(js)<j)j(s)ds 



i = j 
i + j 


(4.2-11) 


where we have used the facts that n t and /; ; are zero-mean, and since n(t) is white, 
its autocorrelation function is ^S(r). In the last step we applied the orthonormality of 
[(f) Equation 4.2-1 1 shows that for i ^ j, n, and nj are uncorrelated and since they 
are Gaussian, they are independent as well. It also shows that each nj has a variance 
equal to ^ . 

Now we study the properties of ri 2 (t). We first observe that since the nf s are 
jointly Gaussian random variables, the process /? i(f) is a Gaussian process and thus 
n 2(0 = n{t) — «i(f), which is a linear combination of two jointly Gaussian processes, 
is itself a Gaussian process. At any given t we have 


COV [njti2(t)] = E[iijn 2 (t)] 

= E [n jii(t)\ — E [n jti\(t)] 



/•OO 


N 

E 

n(t) / n(s)(j)j(s)ds 
J — OO 

- E 

nj'EmMt) 

i = l 


N 0 r°° N 0 

= 1 I &(t - s)(/>j(s)ds - —(/>j(t) 

^ J — OO ^ 




(4.2-12) 


where we have used the fact that E [n 7 n,] = 0, except when i = j, in which case 
E [n jtij] = No/2. 

Equation 4.2-12 shows that n 2 {t) is uncorrelated with all n/ s, and since they are 
jointly Gaussian, n 2 (t) is independent of all ii j ’s, and therefore it is independent of n i (t ). 

Since n 2 (t) is independent of s m (t) and n \ (f). we conclude that in Equation 4.2-8, 
the two components of r(t), namely, r j ( r ) ant l n 2 (t), are independent. Since the 


170 


Digital Communications 


first component is the only component that carries the transmitted signal, and the sec- 
ond component is independent of the first component, the second component cannot 
provide any information about the transmitted signal and therefore has no effect in the 
detection process and can be ignored without sacrificing the optimality of the detector. 
In other words ni(f) is irrelevant information for optimal detection. 

From the above discussion it is clear that for the design of the optimal detector, the 
AWGN waveform channel of the form 

r(t) = s,„(t ) + n(t), 1 < m < M (4.2-13) 

is equivalent to the A -dimensional vector channel 

r = s m + n, 1 < m < M (4.2-14) 


4.2-1 Optimal Detection for the Vector AWGN Channel 


The additive AWGN vector channel is the vector equivalent channel to the waveform 
AWGN channel and is described by Equation 4.2-14 in which the components of the 
noise vector are iid zero-mean Gaussian random variables with variance . The joint 
PDF of the noise vector is given by Equation 4.1-4. The MAP detector for this channel 
is given by 


m = arg max [ P m p(r\s m )\ 

1 <m<M 

= arg max P m [ p n (r - s m )] 

1 <m<M 


= arg max 

1 <m<M 
(a) 

= arg max 

1 <m<M 


C b ) 

= arg max 

1 <m<M 
(c) 

= arg max 

1 <m<M 


— arg max 

1 <m<M 
id) 

= arg max 

1 <m<M 
(e) 


( l 




N 


lk-g/n II 

e "o 


Pm e 


N 0 


In P m - 


I \r-s„ 
N 0 


N 0 1 

— In P„, - ~ Ik - s„ 


Y lnPm ~ \ (H r ll 2 + ll s '« II 2 ~ 2r ■ s " 


— In P --£ + r ■ s 
2 111 1 rn 


(4.2-15) 


= arg max [ i]„, + /• • s m \ 

1 <m<M 

where we have used the following steps in simplifying the expression: 

(a) : is a positive constant and can be dropped. 

(b) : ln(-) is an increasing function. 


Chapter Four: Optimum Receivers for AW GN Channels 


171 


(c): ^ is positive and multiplying by a positive number does not affect the result of 
arg max. 

(. d ): || r || 2 was dropped since it does not depend on m and ||s m || 2 = £ m . 

(e): We have defined 

llm = Y lnP m- l 2 £ m (4.2-16) 

as the bias term. 


From Equation 4.2-15, it is clear that the optimal (MAP) decision rule for an 
AWGN vector channel is given by 


m = arg max [p„, + r ■ s,„] 

1 <m<M 


T}m 


No 

2 


In P m 



(4.2-17) 


In the special case where the signals are equiprobable, i.e., P m = I / M for all m, 
this relation becomes somewhat simpler. In this case Equation 4.2-15 at step (c) can 
be written as 



\No, „ 

1 ,1 

= arg max 

— In P,n - 

dk -Smll" 

1 <m<M 

2 

2 

= arg max 

-Ik -Smll 2 


1 <m<M 



= arg min 

r ~ s m || 


1 <m<M 




(4.2-18) 


where we have used the fact that maximizing — [|r — s m || 2 is equivalent to minimizing 
its negative, i.e., ||r — s m || 2 , which is equivalent to minimizing its square root ||r — s m || . 

A geometric interpretation of Equation 4.2-18 is particularly convenient. The re- 
ceiver receives r and looks among all s m to find the one that is closest to r using standard 
Euclidean distance. Such a detector is called a nearest-neighbor, or minimum-distance, 
detector. Also note that in this case, since the signals are equiprobable, the MAP and the 
ML detector coincide, and both are equivalent to the minimum-distance detector. In this 
case the boundaries of decisions D m and D m : are the set of points that are equidistant 
from s m and s m ', which is the perpendicular bisector of the line connecting these two 
signal points. This boundary in general is a hyperplane. For the case of N = 2 the 
boundary is a line, and for N = 3 it is a plane. These hyperplanes completely deter- 
mine the decision regions. An example of a two-dimensional constellation ( N = 2) 
with four signal points ( M = 4) is shown in Figure 4.2-1. The solid lines denote the 
boundaries of the decision regions which are the perpendicular bisectors of the dashed 
lines connecting the signal points. 

When the signals are both equiprobable and have equal energy, the bias terms 
defined as //,„ = d In P m — \£ m are independent of m and can be dropped from 
Equation 4.2-17. The optimal detection rule in this case reduces to 


m = arg max r ■ s m 

1 <m<M 


(4.2-19) 
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FIGURE 4.2-1 

The decision regions for equiprobable signaling. 

In general, the decision region D m is given as 

D m = {r e M> N : r ■ s m + rj m > r ■ s m > + p,„/, for all 1 < m' < M and m' 7 - m] 

(4.2-20) 

Note that each decision region is described in terms of at most M — 1 inequalities. In 
some cases some of these inequalities are dominated by the others and are redundant. 
Also note that each boundary is of the general form of 

r ■ (s m - s m : ) > r] m , - r] m (4.2-21) 

which is the equation of a hyperplane. Therefore the boundaries of the decision regions 
in general are hyperplanes. 

From Equation 2.2-47, we know that 

/ OO 

r(t)s m (t)dt (4.2-22) 

-OO 

and 

/ OO 

s; n (t)dt (4.2-23) 

-OO 

Therefore, the optimal MAP detection rule in an AWGN channel can be written in the 
form 


m = argmax 

1 <m<M 


N 0 , 

— In P m + 


r(t)s m (t)dt 


1 

2 



s 2 m (t)dt 


(4.2-24) 
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and the ML detector has the following form: 


m = arg max 

1 <m<M 


r(t)s m (t)dt - - 

I ^ ■ 


s m {t)dt 


(4.2-25) 


At this point it is convenient to introduce three metrics that we will use frequently 
in the future. We define the distance metric as 

1.2 


D(r , s m ) = || r - s„ 


= / (r(t) — s m (t)) 2 dt 


(4.2-26) 


denoting the square of the Euclidean distance between r and s m . The modified distance 
metric is defined as 

D'(r , s m ) = —2r ■ s m + ||s m || 2 (4.2-27) 

and is equal to the distance metric when the term ||r || 2 , which does not depend on m, 
is removed. The correlation metric is defined as the negative of the modified distance 
metric and is given by 

( ir , s m ) = 2/ s m ll^ni II 

(4.2-28) 


= 2 


r(t)s m (t)dt — / s m {t)dt 


It is important to note that using the term metric is just for convenience. In general, 
none of these quantities is a metric in a mathematical sense. With these definitions the 
optimal detection rule (MAP rule) in general can be written as 


m = arg max [ Mi In P m — Dir , s m ) ] 

1 <m<M 

= arg max [N 0 In P m + C(r. s m )] 

1 <m<M 


and the ML detection rule becomes 


m = arg max C(r , s m ) 

1 <m<M 


(4.2-29) 


(4.2-30) 


Optimal Detection for Binary Antipodal Signaling 

In a binary antipodal signaling scheme si(f) = sit ) and 52(0 = — s(t). The probabilities 
of messages 1 and 2 are p and 1 — p, respectively. This is obviously a case with 
N = 1, and the vector representations of the two signals are just scalars with ,V | = 
and S 2 = where £ s is energy in each signal and is equal to £/,. Following 

Equation 4.2-20, the decision region D\ is given as 


Di = \r : r sfEb + ^ In p - ]-£ b > -r\f£~ b + ^ ln(l - p) - l~£ b 


= < r : r 


= [r :r > r th } 


No 1 -p 
In 


(4.2-31) 
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FIGURE 4.2-2 

The decision regions for antipodal signaling. 






where the threshold r t h is dehned as 

t'th = 


N 0 1 -p 

— In 

4 yf8b p 


(4.2-32) 


The constellation and the decision regions are shown in Figure 4.2-2. 

Note that as p — »■ 0, we have r t h — > oo and the entire real line becomes Di, 
and when p — > 1, the entire line becomes D \ , as expected. Also note that when 
p = \, i.e., when the messages are equiprobable, /' th = 0 and the decision rule reduces 
to a minimum-distance rule. To derive the error probability for this system, we use 
Equation 4.1-15. This yields 


= — \/ £h ) dr 


P e = ^2 Pm ^2 ) dr 

m = 1 l<m'<2' lD '«' 

m’jim 

= P P v s = 2~£b ) dr + (1 - p) / p 
Jd 2 v ' Jdi 

= P [ P ( r s = \f£b ) dr + (1 - p) f p (r s = -\fZ b ) dr (4.2-33) 

J— OO ^ ' J Tth ^ ' 

+ (1-P)P 


= P P 

= pQ 


Af ( -M. < r„' 


JV(-25.f )>r«, 


V £b ^th 


■(1-P)G 


t"th T \/~Eb 


where in the last step we have used Equation 2.3-12. In the special case where p = \, 
we have r t h = 0 and the error probability simplihes to 


Pe = Q 


I2£b 

N 0 


(4.2-34) 


Also note that since the system is binary, the error probability for each message is equal 
to the bit error probability, i.e., Pi, = P e . 


Error Probability for Equiprobable Binary Signaling Schemes 

In this case the transmitter transmits one of the two equiprobable signals ^i(t) and 
S 2 (t) over the AWGN channel. Since the signals are equiprobable, the two decision 
regions are separated by the perpendicular bisector of the line connecting s\ and s 2 - 
By symmetry, error probabilities when s ] or ,yi is transmitted are equal, therefore 
Pb = P [error .v i sent ]. The decision regions and the perpendicular bisector of the line 
connecting s \ and s 2 are shown in Figure 4.2-3. 

Since we are assuming that .v 1 is sent, an error occurs if r is in Do, which means the 
distance between the projection of r — s \ on S 2 — s 1 , i.e., point A, from s 1 is larger than 
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FIGURE 4.2-3 

Decision regions for binary equiprobable signals. 


, where d\ 2 = ||*2 — *1 II- Note that since si is sent, n = r —s i, and the projection of 
r — s i on s 2 — s \ becomes equal to n '^ s j~ Sl) • Therefore, the error probability is given by 


or 


Pb= P 


n ■ (s 2 ~*i) 
dn 



(4.2-35) 


P h 


P 


n • ( s 2 


*i) > 



(4.2-36) 


We note that n ■ (s 2 — * i ) is a zero-mean Gaussian random variable with variance ; 
therefore, using Equation 2.3-12, we obtain 


P b = Q 


= Q 



(4.2-37) 


Equation 4.2-37 is very general and applies to all binary equiprobable signaling 
systems regardless of the shape of the signals. Since <2(0 is a decreasing function, in 
order to minimize the error probability, the distance between signal points has to be 
maximized. The distance d\ 2 is obtained from 

/ OO 

(st(0-*2(0) 2 dt (4.2-38) 

-OO 

In the special case that the binary signals are equiprobable and have equal energy, 
i.e., when <0, = £ Sl = £, we can expand Equation 4.2-38 and get 

4 = £„ + £ s 2 - 2(*1 (f), * 2 (0) = 2£(1 - P) (4.2-39) 

where p is the cross-correlation coefficient between ,V| (?) and s 2 (t) defined in 
Equation 2.1-25. Since —1 < p < 1 , we observe from Equation 4.2-39 that the binary 
signals are maximally separated when p = — 1, i.e., when the signals are antipodal. In 
this case the error probability of the system is minimized. 
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FIGURE 4.2-4 

Signal constellation and decision regions for 
equiprobable binary orthogonal signaling. 



Optimal Detection for Binary Orthogonal Signaling 

For binary orthogonal signals we have 

( £ i = j 

/ Si(t) Sj (t)dt = { . 1 < i, j < 2 (4.2-40) 

4-oo [0 i ^ j 

Note that since the system is binary, £ b = £. Here we choose (pj(t) = for j = 1,2, 
and the vector representations of the signal set become 


si = {V£b, 0 ) 
s 2 = ( 0 , sf££ b ) 


(4.2-41) 


The constellation and the optimal decision regions for the case of equiprobable signals 
are shown in Figure 4.2-4. 

For this signaling scheme it is clear that d = \j2£ b and 

p " = e (/ 5)-*( 



(4.2^12) 


Comparing this result with the error probability of binary antipodal signaling given in 
Equation 4.2-34, we see that a binary orthogonal signaling requires twice the energy 
per bit of a binary antipodal signaling system to provide the same error probability. 
Therefore in terms of power efficiency, binary orthogonal signaling underperforms 
binary antipodal signaling by a factor of 2, or equivalently by 3 dB. 

The term 

Yb = 77 - (4.2-43) 

No 

which appears in the expression for error probability of many signaling systems is 
called the signal-to-noise ratio per bit , or SNR per bit , or simply the SNR of the 
communication system. Plots of error probability as a function of SNR/bit for binary 
antipodal and binary orthogonal signaling are shown in Figure 4.2-5. It is clear from 
this figure that the plot for orthogonal signaling is the result of a 3-dB shift of the plot 
for antipodal signaling. 
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SNR per bit, y h (dB) 


FIGURE 4.2-5 

Error probability for binary antipodal and binary 
orthogonal signaling. 


4.2-2 Implementation of the Optimal Receiver for AW GN Channels 

In this section we present different implementations of the optimal (MAP) receiver for 
the AWGN channel. All these structures are equivalent in performance and result in 
minimum error probability. The underlying relation that is implemented by all these 
structures is Equation 4.2-17 which describes the MAP receiver for an AWGN channel. 

The Correlation Receiver 

An optimal receiver for the AWGN channel implements the MAP decision rule given 
by Equation 4.2-44. 

A 0 1 „ 

m = arg max [rj m + r ■ s m ], where rj m = — In P m - -£ m (4.2^14) 

l<m<M ^ ^ 

However, the receiver has access to r(t) and not the vector r. The first step to implement 
Equation 4.2-44 at the receiver is to derive r from the received signal r(t). Using the 
relation 

/ OO 

r(t)(/>j(t)dt (4.2-45) 

-OO 

the receiver multiplies r(t) by each basis function (pj(t') and integrates the result to 
find all components of r. In the next step it finds the inner product of r with each 
s m , 1 < m < M, and finally adds the bias terms rj m and compares the results and 
chooses the m that maximizes the result. Since the received signal r(t) is correlated 
with each <f>j(t), this implementation of the optimal receiver is called a correlation 
receiver. 

The structure of a correlation receiver is shown in Figure 4.2-6. 
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FIGURE 4.2-6 

The structure of a correlation receiver with N correlators. 

Note that in Figure 4.2-6, rj m ’s and s m ’s are independent of the received signal 
r(t)\ therefore they can be computed once and stored in a memory for later access. The 
parts of this diagram that need constant computation are the correlators that compute 
r ■ s m for 1 < m < M. 

Another implementation of the optimal detector is possible by noting that the 
optimal detection rule given in Equation 4.2—44 is equivalent to 

POO 

;h = argmax rj m + / r(t)s m (t)dt 

1 <m<M L J — oo 

(4.2-46) 

Therefore, r ■ s m can be directly found by correlation r(t) with s m (t)’ s. Figure 4.2-7 
shows this implementation which is a second version of the correlation receiver. 

Note that although the structure shown in Figure 4.2-7 looks simpler than the 
structure shown in Figure 4.2-6, since in most cases N < M (and in fact N <<C M), the 
correlation receiver of Figure 4.2-6 is usually the preferred implementation method. 

The correlation receiver requires N or M correlators, i.e., multipliers followed 
by integrators. We now present an alternative implementation of the optimal receiver 
called the matched filter receiver. 


where ij m = In - U„ 


The Matched Filter Receiver 

In both correlation receiver implementations we compute quantities of the form 

/ OO 

r(t)x(t)dt (4.2 — 47) 

-OO 


where x(t) is either 4>j(t) or s m (t). If we define hit) = x(T — t), where T is arbitrary, 
and consider a filter with impulse response h(t), this filter is called a filter matched to 
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FIGURE 4.2-7 

The structure of the correlation receiver with M correlators. 


x(t), or a matched filter. If the input r(t) is applied to this filter, its output, denoted by 
y(t), is the convolution of r(t) and h(t) and is given by 


y(t) = r(t) * h(t) 

r oo 

= / r(x)h(t — x)dx 


/ — OO 
POO 


r(x)x(T — t + x)dx 


(4.2-48) 


From Equation 4.2^18 it is clear that 

/ OO 

r(x)x{x)dx (4.2-49) 

-OO 

In other words, the output of the correlator r x can be obtained by sampling the output 
of the matched filter at time t = T . Note that the sampling has to be done exactly at 
time t = T, where T is the arbitrary value used in the design of the matched filter. As 
long as this condition is satisfied, the choice of T is irrelevant; however from a practical 
point of view, T has to be selected in such a way that the resulting filters are causal; 
i.e, we must have h{t ) = 0 for t < 0. This puts a practical limit on possible values of 
T. A matched filter implementation of the optimal receiver is shown in Figure 4.2-8. 

Another matched filter implementation with M filters matched to {s m (t), 1 < 
m < M] similar to the correlation receiver shown in Figure 4.2-7 is also possible. 


Frequency Domain Interpretation of the Matched Filter The matched filter to any 
signal s{t) has an interesting frequency-domain interpretation. Since h(t) = s(T — t), 
the Fourier transform of this relationship, using the basic properties of the Fourier 
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FIGURE 4.2-8 

The structure of a matched filter receiver with N correlators, 
transform, is 

H(f) = S*(f)e~ j27TfT (4.2-50) 

We observe that the matched filter has a frequency response that is the complex conju- 
gate of the transmitted signal spectrum multiplied by the phase factor e~ jl7T ^ T , which 
represents the sampling delay of T . In other words, \H(f)\ = |S(/)|, so that the mag- 
nitude response of the matched filter is identical to the transmitted signal spectrum. On 
the other hand, the phase of //(/) is the negative of the phase of S(f) shifted by 2nfT. 

Another interesting property of the matched filter is its signal-to-noise maximizing 
property. Let us assume that r(t) = s(t) + n(t) is passed through a filter with impulse 
response h(t) and frequency response //(/), and the output, denoted by y(t) = y s (t) + 
v(t), is sampled at some time T . The output consists of a signal part, y s (t), whose 
Fourier transform is ) and a noise part, v(t), whose power spectral density is 

^r\H(f)\ 2 . Sampling these components at time T results in 

/ OO 

H{ f)S(f)e i2nft dt (4.2-51) 

-OO 

and a zero-mean Gaussian noise component, v(T), whose variance is 

N C°° N 

VAR[u(T)] = / \H(f)\ 2 df = -°^ (4.2-52) 

where £/, is the energy in h(t). Now let us define the SNR at the output of the filter 
H( f) as 

y?(T) 


SNR, 


VAR[v(T)] 


(4.2-53) 
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From the Cauchy-Schwartz inequality given in Equation 2.2-19, we have 

/ OO 

H(f)S(f)e jlnft dt 

-OO 

/ oo r oo 

\H(f)\ 2 df- / \S{f)e i2 * fT \ 2 df 

-oo J — OO 

= £h £ s 


with equality if and only if //(/) = aS*(f)e j2rr J 7 for some complex constant a. 
Using Equation 4.2-54 in 4.2-53, we conclude that 


SNR 0 < 


£ s £h 



2£s_ 

N 0 


(4.2-55) 


This shows that the filter H(f ) that maximizes the signal-to-noise ratio at its output 
must satisfy the relation //(/) = S*(/)e~ ;2jr ^ r ; i.e., it is the matched filter. It also 
shows that the maximum possible signal-to-noise ratio at the output is . 

example 4.2-1. M — 4 biorthogonal signals are constructed from the two orthogonal 
signals shown in Figure 4.2-9(a) for transmitting information over an AWGN channel. 
The noise is assumed to have zero mean and power spectral density ^Nq. Let us 
determine the basis functions for this signal set, the impulse responses of the matched 
filter demodulators, and the output waveforms of the matched filter demodulators when 
the transmitted signal is si(t). 
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FIGURE 4.2-9 

Basis functions and matched filter response for Example 4.2-1. 
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The M = 4 biorthogonal signals have dimensions N = 2. Hence, two basis 
functions are needed to represent the signals. From Figure 4.2-9(a), we choose 4>\(t) 
and <p 2 (t) as 


01 (0 
02(0 


Vyr 

0 <t <\t 

0 

otherwise 

V2/T 

\T <t <T 

0 

otherwise 


The impulse responses of the two matched filters are 


hi(t) 

= 0! (T- 


VVT 

\T <t <T 

^ 0 

otherwise 




f Vyr 

0 <t <\T 

h 2 (t ) 

= <h(T ~ 

■ t) = < 


1° 

otherwise 


(4.2-56) 


(4.2-57) 


and are illustrated in Figure 4.2-9(b). 

If ,V| (f) is transmitted, the (noise-free) responses of the two matched filters are as 
shown in Figure 4.2-9(c). Since vi if) and y 2 (t) are sampled at t = 7’, we observe that 

yi s (T) = \j l, A 2 T and y 2s ( T ) = 0. Note that \A 2 T — S, the signal energy. Hence, 
the received vector formed from the two matched filter outputs at the sampling instant 
t = T is 


r = (ri,r 2 ) = (V£ + n u n 2 ) 


(4.2-58) 


where n\ = yi„(T ) and n 2 = y 2n (T) are the noise components at the outputs of the 
matched filters, given by 

y k n(T)= f n(t)<pk(t)dt, k — 1,2 (4.2-59) 

Jo 

Clearly, E [n{] = E [y/,„(T) ] = 0. Their variance from Equation 4.2-52 is 

VAR [n k ] = = l -N 0 (4.2-60) 

Observe that the SNR for the first matched filter is 

(• VE ) 2 2 S 

SNR 0 = , ’ = — (4.2-61) 

2 No No 

which agrees with our previous result. 


4.2-3 A Union Bound on the Probability of Error of Maximum 
Likelihood Detection 

In general, to determine the error probability of a signaling scheme, we need to use 
Equation 4.1-13. In the special case where the messages are equiprobable, P m = 1 /M 
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and maximum likelihood detection is optimal. The error probability in this case becomes 


M 


p,, = y p, 

M ^ 


e\ m 


m = 1 
M 


= S E E 


p(r\s m )dr 


m— 1 1 <m'<M 1 
rn'&n 


(4.2-62) 


For an AWGN channel the decision regions are given by Equation 4.2-20. Therefore, 
for AWGN channels we have 


P e \m= / P(r\s m )dr 

l<m'<M ^ Dm ' 



m'^m 


(4.2-63) 


For very few constellations, decision regions D m > are regular enough that the integrals 
in the last line of Equation 4.2-63 or Equation 4.2-62 can be computed in a closed 
form. For most constellations (for example, look at Figure 4.2-1) these integrals cannot 
be put in a closed form. In such cases it is convenient to have upper bounds for the error 
probability. There exist many bounds on the error probability under ML detection. The 
union bound is the simplest and most widely used bound which is quite tight particularly 
at high signal-to-noise ratios. 

We first derive the union bound for a general communication channel and then 
study the AWGN channel as a special case. First we note that in general the decision 
region D m ’ under ML detection can be expressed as 

D m ’ = {r e : p(r\s m 0 > p(r\sk), for all 1 < k < M and k ^ m'} (4.2-64) 

Let us define D mm ’ as 


D mm > = { p(r\s m ’) > p(r\s m )} (4.2-65) 

Note that D mm ’ is the decision region for in' in a binary equiprobable system with 
signals s m and Comparing the definitions of D m ’ and D mm ’, we obviously have 

D,,,’ C D mm , (4.2-66) 


hence 


/ p(r\s m ) dr < [ p(r\s m )dr (4.2-67) 

J D m i J D mm r 

Note that the right-hand side of this equation is the error probability of a binary 
equiprobable system with signals s m and s m / when s m is transmitted. We define the 
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pairwise error probability, denoted by as 


p(r\s m )dr 


From Equations 4.2-63 and 4.2-67 we have 


P e\ m — ^ ^ 

1 <m'< 

m'^m 

= E * 


p(r\s m )dr 


l<m’<M J Dmm ' 

m'^m 


1 <m'<M 

m'jLm 


and from Equation 4.2-62 we conclude that 


1 


M 


p e < — y y 

M ^ ^ 


m= 1 1 <m’<M JO *«' 

m'+m 

M 


p(r\s m ) dr 


— m E E 


n = 1 1 <m'<M 
•n'fri 


(4.2-68) 


(4.2-69) 


(4.2-70) 


Equations 4.2-70 is the union bound for a general communication channel. 

In the special case of an AWGN channel, we know from Equation 4.2-37 that the 
pairwise error probability is given by 


= P b = Q 



(4.2-71) 


By using this result, Equation 4.2-70 becomes 

1 M 


Pe ~ M ^ ^ Q 


m = 1 1 <m'<M 

m'^m 

M 


d lm' 

2Nq 


< 


2 M 


E E 

m= 1 1 <m'<M 

m'^m 


mm' 

e *nq 


(4.2-72) 


where in the last step we have used the upper bound on the Q function given in 
Equation 2.3-15 as 

GOO < \e~$ (4.2-73) 

Equation 4.2-72 is the general form of the union bound for an AWGN channel. If 
we know the distance structure of the constellation, we can further simplify this bound. 
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Let us define T(X), the distance enumerator function for a constellation, as 


T(X) = Y 

4m' = II Sm S m r || 

1 <m,m'<M 

= y a “ xd2 

all distinct d ' s 


(4.2-74) 


where ad denotes the number of ordered pairs (m , m') such that m f m' and ||s m — 
s m > || = d. Using this function, Equation 4.2-72 can be written as 


P e < T(X) 

2 M y 


1 

4A1 0 


(4.2-75) 


Let us define d m j n , the minimum distance of a constellation, as 


dm in — min \\s m s m / 1| 

1 < m , m'<M 

m f=m ' 


Since Q(-) is decreasing, we have 



Substituting in Equation 4.2-70 results in 

P e <(M-Y)Q 



(4.2-76) 


(4.2-77) 


(4.2-78) 


Equation 4.2-78 is a looser form of the union bound in terms of the Q function and 
dmin which has a very simple form. Using the exponential bound for the Q function we 
have the union bound in the simple form 



e 


u 

mm 

4JV 0 


(4.2-79) 


The union bound clearly shows that the minimum distance of a constellation has 
an important impact on the performance of the communication system. A good con- 
stellation should be designed such that, within the power and bandwidth constraints, it 
provides the maximum possible minimum distance; i.e., the points in the constellation 
should be maximally separated. 


example 4.2-2. Let us consider the 16-QAM constellation shown in Figure 4.2-10. 
We assume that the distance between any two adjacent points on the constellation is 
dmin- From Equation 3.2 — 44 we have 


dmin — 


6 log 2 M 


-bavg 


— \ I ^ ^bavg 


(4.2-80) 


M - 1 

Close observation of this constellation shows that from a total of 16 x 15 = 240 
possible distances between any two points in the constellation, 48 are equal to d nlm , 
36 are equal to d^ n , 32 are 2d m j n , 48 are V 5 dmin, 16 are V8 d m i n , 16 are 3d m j n , 24 are 

V^TO d m in, 16 are vTl d nlln , and finally 4 are vT8d m in- Note that each line connecting 
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FIGURE 4.2-10 

16-QAM constellation. 


any two points in the constellation is counted twice. Therefore, the distance enumerator 
function for this constellation is given by 


r(X) = 48X ir + 36X M +32X™~+48X 


M d 1 


,5d 2 


16X 


8 d 1 


\6X 9d + 24X 10d + l6X l3d +4X m 

(4.2-81) 

where for ease of notation we have substituted d m j n by d. The union bound becomes 




Pe< ~ 

~ 32 

A looser, but simpler, form of the union bound is obtained in terms of (/ m ; n as 


(4.2-82) 


M~ 1 _Sm 15 2£ bavg 

p < e 4« 0 _ £ 5 N 0 

2 2 


(4.2-83) 


where in the last step we have used Equation 4.2-80. 

In the case when d 3 mn is large compared to /Vo, i.e., when SNR is large, the first 
term is the dominating term in Equation 4.2-82. In this case we have 


48 _Jffi 3 2£ bavg 

P e < — e 4w o = e 5N » 
~32 2 


(4.2-84) 


It turns out that for this constellation it is possible to derive an exact expression for 
the error probability (see Example 4.3-1), and the expression for the error probability 
is given by 


P e = 3Q 



Q 


1 4 £\ 


bavg 


5 N 0 


(4.2-85) 


Plots of the exact error probability and the upper bounds given by Equations 4.2-83 
and 4.2-84 are shown in Figure 4.2-1 1. 


A Lower Bound on the Probability of Error 

In an equiprobable M - ary signaling scheme, the error probability is given by 


1 


M 


P e = ^ P [Error | m sent] 


=i 

M 


I ™ 

= ~y 


(4.2-86) 


M — j D c 

m = 1 m 


p(r\s m ) dr 
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FIGURE 4.2-11 

Comparison of the exact error probability and two upper bounds for rectangular 16-QAM. 


From Equation 4.2-66 we have D c m , m C D ‘ m ; hence, 


1 M 

p e > — V 

e — M ^ 


M — j D c, 

m= 1 m'r 

M 




p(r\s m )dr 

p(r\s m )dr 


(4.2-87) 


m— 1 J Ufnm ' 
M 




( drum' \ 


\sTm) 


Equation 4.2-87 is valid for all m' i=- m. To derive the tightest lower bound, we 
need to maximize the right-hand side. Therefore we can write 


1 M 

P e > — V max Q , , , 

M m'±m ^ s/2Nq J 

m— 1 


f d mm > \ 


(4.2-88) 
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Since the Q function is a decreasing function of its variable, choosing in ' that maximizes 



is equivalent to finding in' such that d mm ' 


is minimized. Hence, 


1 


M 


>-Ve 

m= 1 


( < in \ 

\V2N~oJ 


(4.2-89) 


where denotes the distance from m to its nearest neighbor in the constellation, and 
obviously c/”' in > d mm . Therefore, 


Q 



if there exists at least one signal at distance d mm from s m 
otherwise 

(4.2-90) 


By using Equation 4.2-90, Equation 4.2-89 becomes 


Pe > 


M 


E 

\<m<M 

3m'^m:\\s m -s / ||=d m 


Q 



(4.2-91) 


Denoting by N mw the number of the points in the constellation that are at the distance 
from d m i n from at least one other point in the constellation, we obtain 

<4 ' 2 - 92) 

From Equations 4.2-92 and 4.2-78, it is clear that 

(4 - 2 - 93 ' 


■ 4.3 

OPTIMAL DETECTION AND ERROR PROBABILITY 
FOR BAND-LIMITED SIGNALING 

In this section we study signaling schemes that are mainly characterized by their low 
bandwidth requirements. These signaling schemes have low dimensionality which is 
independent from the number of transmitted signals, and, as we will see, their power 
efficiency decreases when the number of messages increases. This family of signaling 
schemes includes ASK, PSK, and QAM. 


4.3-1 Optimal Detection and Error Probability for ASK or PAM Signaling 


The constellation for an ASK signaling scheme is shown in Figure 4.3-1. In this con- 
stellation the minimum distance between any two points is d m ; n which is given by 
Equation 3.2-22 as 


dmm — 


12 log 2 M 




bavg 


M 2 - 1 


(4.3-1) 
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d mi „ FIGURE 4.3-1 

« • — — — •- The ASK constellation. 


The constellation points are located at ±G/ mm , . . . , ±^-3c/ mm }. 

We notice there exist two types of points in the ASK constellation. There 
are M — 2 inner points and 2 outer points in the constellation. If an inner point is 
transmitted, there will be an error in detection if \n\ > \dn ,i n . For the outer points, the 
probability of error is one-half of the error probability of an inner point since noise can 
cause error in only one direction. Let us denote the error probabilities of inner points 
and outer points by P ei and P eo , respectively. Since n is a zero-mean Gaussian random 
variable with variance \No, we have 


P„ = P 


I n | > ^ d min 



and for the outer points 


Pen = 




The symbol error probability is given by 


(4.3-2) 


(4.3-3) 


M 


1 

P e = — > P [error \m sent] 

m= 1 

2(M — 2) Q 


M 

1 

M _ 

2 (M - 1) 


M 


Q 


(d 

\vm) 


dmm \ 

s/2N~J 


Substituting for d m i n from Equation 4.3-1 yields 


+ 2 Q 


— mm i 

7m). 


(4.3-4) 




Q 


1 6 l0g 2 M g b avg 

M 2 — 1 N 0 


2Q 


1 6 log 2 M 4 avg 
M 2 - 1 No 


for large M 


(4.3-5) 


Note that the average SNR/bit ( ^P L is scaled by ^ . This factor goes to 0 as M 
increases, which means that to keep the error probability constant as M increases, the 
SNR/bit must increase. For large M , doubling M — which is equivalent to increasing 
the transmission rate by 1 bit per transmission — would roughly need the SNR/bit to 
quadruple, i.e., an increase of 6 dB, to keep the performance the same. In other words, 
as a rule of thumb, for increasing the transmission rate by 1 bit, one would need 6 dB 
more power. 

Plots of the error probability of baseband PAM and ASK as a function of the 
average SNR/bit for different values of M are given in Figure 4.3-2. It is clear that 
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FIGURE 4.3-2 

Symbol error probability for baseband PAM and ASK. 

increasing M deteriorates the performance, and for large M the distance between curves 
corresponding to M and 2M is roughly 6 dB. 


4.3-2 Optimal Detection and Error Probability for PSK Signaling 

The constellation for an M - ary PSK signaling is shown in Figure 4.3-3. In this con- 
stellation the decision region D\ is also shown. Note that since we are assuming the 
messages are equiprobable, the decision regions are based on the minimum-distance 
detection rule. By symmetry of the constellation, the error probability of the system is 
equal to the error probability when s\ = (VS, 0) is transmitted. The received vector r 
is given by 


r = 0‘i, r 2 ) = (VS + n u n 2 ) 


(4.3-6) 
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FIGURE 4.3-3 

The constellation for PSK signaling. 


It is seen that r\ and n are independent Gaussian random variables with variance 
a 2 = \Nq and means >/£ and 0, respectively; hence 

| (r!-V£) 2 +r2 

p(r 1 ,r 2 )=— e "0 (4.3-7) 

jtN 0 

Since the decision region D\ can be more conveniently described using polar 
coordinates, we introduce polar coordinates transformations of (r \ , r{) as 


V = + r 2 

0 = arctan — 


from which the joint PDF of V and 0 can be derived as 

1) v^+£— 2y^£ v cos0 

Pv,e(v, 0) = — — e "o 

nNo 


Integrating over v, we derive the marginal PDF of 0 as 

pOO 

/ pv,@(v, 9)dv 


1 

27T 


-y s sin- 




00 (v — y/tys COS0^ 

ve~~ 2 dv 


(4.3-8) 


(4.3-9) 


Pe(fl) = 


(4.3-10) 
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e 


FIGURE 4.3-4 

The PDF of 0 for y s = 1, 2, 4, and 10. 

in which we have defined the symbol SNR or SNR per symbol as 


Figure 4.3-4 illustrates p®(9) for several values of y s . Note that p®(9) becomes nar- 
rower and more peaked about 9 = 0 as y s increases. 

The decision region D\ can be described as D\ = {9 : —n/M < 9 < n/M}; 
therefore, the message error probability is given by 


In general, the integral of p@(9) does not reduce to a simple form and must be 
evaluated numerically, except for M = 2 and M = 4. 

For binary phase modulation, the two signals si(f) and S 2 (t ) are antipodal, and 
hence the error probability is 



(4.3-12) 



(4.3-13) 
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When M = 4, we have in effect two binary phase-modulation signals in phase quadra- 
ture. Since there is no crosstalk or interference between the signals on the two quadrature 
carriers, the bit error probability is identical to that in Equation 4.3-13. On the other 
hand, the symbol error probability for M = 4 is determined by noting that 


Pc = ( 1 - Pb) 2 



(4.3-14) 


where P c is the probability of a correct decision for the 2-bit symbol. Equation 4.3-14 
follows from the statistical independence of the noise on the quadrature carriers. There- 
fore, the symbol error probability for M = 4 is 

Pe= 1 - Pc 



For M > 4, the symbol error probability P e is obtained by numerically integrating 
Equation 4.3-12. Figure 4.3-5 illustrates this error probability as a function of the SNR 
per bit for M = 2,4, 8, 16, and 32. The graphs clearly illustrate the penalty in SNR per 
bit as M increases beyond M = 4. For example, at P e = 1 0 5 . the difference between 
M = 4 and M = 8 is approximately 4 dB, and the difference between M = 8 and 
M = 16 is approximately 5 dB. For large values of M , doubling the number of phases 



4 8 12 16 20 24 

SNR per bit, y b (dB) 


FIGURE 4.3-5 

Probability of symbol error for PSK signals. 
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requires an additional 6 dB/bit to achieve the same performance. This performance is 
similar to the performance of ASK signaling discussed in Section 4.3-1. 

An approximation to the error probability for large values of M and for large SNR 
may be obtained by first approximating p@(9). For £/Nq^> 1 and \9\ < p@(9) is 

well approximated as 


P®(0) 


— cos 9 e~ Ys si " 2 6 


(4.3-16) 


By substituting for p@(9) in Equation 4.3-12 and performing the change in variable 
from 9 to u = sin 9, we find that 

P e « 1 — [ ' s f^cos9e~ ysSin2e d9 
J—JT/M V 

2 r°° 


J aJ'I'Ys sin(7r/M) 


e 11 du 


= 2 Q ( y/lys sin ( J 


= 2 e y ( 2,„ g2 M)W(j) | 

where we have used the definition of the SNR per bit as 

£b _ £ _ Ys 

No No log 2 M log 2 M 


(4.3-17) 


(4.3-18) 


Note that this approximation^ to the error probability is good for all values of M. 
For example, when M = 2 and M = 4, we have P e = 2Q(^f2y\,) which compares 
favorably with the exact probabilities given by Equations 4.3-13 and 4.3-15. 

For the case when M is large, we can use the approximation sin ^ fj to find 
another approximation to error probability for large M as 


2 Q 


I 2jt 2 log, M £ b 
M 2 Ao 


for large M 


(4.3-19) 


From Equation 4.3-19 it is clear that doubling M reduces the effective SNR per bit 
by 6 dB. 

The equivalent bit error probability for M - ary PSK is rather tedious to derive due to 
its dependence on the mapping of &-bit symbols into the corresponding signal phases. 
When a Gray code is used in the mapping, two k - bit symbols corresponding to adjacent 
signal phases differ in only a single bit. Since the most probable errors due to noise 
result in the erroneous selection of an adjacent phase to the true phase, most Gbit 


tA better approximation of the error probability at low SNR is given in the paper by Lu et al. (1999). 
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symbol errors contain only a single-bit error. Hence, the equivalent bit error probability 
for M - ary PSK is well approximated as 

P b -jPe (4.3-20) 

k 


Differentially Encoded PSK Signaling 

Our treatment of the demodulation of PSK signals assumed that the demodulator had 
a perfect estimate of the earner phase available. In practice, however, the carrier phase 
is extracted from the received signal by performing some nonlinear operation that 
introduces a phase ambiguity. For example, in binary PSK, the signal is often squared in 
order to remove the modulation, and the double-frequency component that is generated 
is filtered and divided by 2 in frequency in order to extract an estimate of the carrier 
frequency and phase 0. These operations result in a phase ambiguity of 180° in the 
carrier phase. Similarly, in four-phase PSK, the received signal is raised to the fourth 
power to remove the digital modulation, and the resulting fourth harmonic of the carrier 
frequency is filtered and divided by 4 to extract the carrier component. These operations 
yield a carrier frequency component containing the estimate of the carrier phase 0, but 
there are phase ambiguities of ± 90° and 180° in the phase estimate. Consequently, we 
do not have an absolute estimate of the carrier phase for demodulation. 

The phase ambiguity problem resulting from the estimation of the carrier phase 0 
can be overcome by encoding the information in phase differences between successive 
signal transmissions as opposed to absolute phase encoding. For example, in binary 
PSK, the information bit 1 may be transmitted by shifting the phase of the carrier by 
180° relative to the previous carrier phase, while the information bit 0 is transmitted 
by a zero phase shift relative to the phase in the previous signaling interval. In four- 
phase PSK, the relative phase shifts between successive intervals are 0°, 90°, 180°, 
and —90°, corresponding to the information bits 00, 01, 1 1, and 10, respectively. The 
generalization to M phases is straightforward. The PSK signals resulting from the 
encoding process are said to be differentially encoded. The encoding is performed by 
a relatively simple logic circuit preceding the modulator. 

Demodulation of the differentially encoded PSK signal is performed as described 
above, by ignoring the phase ambiguities. Thus, the received signal is demodulated 
and detected to one of the M possible transmitted phases in each signaling inter- 
val. Following the detector is a relatively simple phase comparator that compares 
the phases of the demodulated signal over two consecutive intervals to extract the 
information. 

Coherent demodulation of differentially encoded PSKresults in a higher probability 
of error than the error probability derived for absolute phase encoding. With differen- 
tially encoded PSK, an error in the demodulated phase of the signal in any given interval 
will usually result in decoding errors over two consecutive signaling intervals. This is 
especially the case for error probabilities below 0.1. Therefore, the probability of error 
in differentially encoded M - ary PSK is approximately twice the probability of error for 
M - ary PSK with absolute phase encoding. However, this factor-of-2 increase in the 
error probability translates into a relatively small loss in SNR. 
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4.3-3 Optimal Detection and Error Probability for QAM Signaling 


In optimal detection of QAM signals, we need two filters matched to 


01 (0 = 


02(0 = 



g(t) cos 2nf c t 


g(t) sin 2nf c t 


(4.3-21) 


The output of the matched filters r = (r \ , rn) is used to compute C(r, s m ) = 
2 r ■ s ln — £ m , and the largest is selected. The resulting decision regions depend on 
the constellation shape, and in general the error probability does not have a closed 
form. 

To determine the probability of error for QAM, we must specify the signal point 
constellation. We begin with QAM signal sets that have M = 4 points. Figure 4.3-6 
illustrates two four-point signal sets. The first is a four-phase modulated signal, and the 
second is a QAM signal with two amplitude levels, labeled A\ and An, and four phases. 
Because the probability of error is dominated by the minimum distance between pairs of 
signal points, let us impose the condition that d, mn = 2 A for both signal constellations 
and let us evaluate the average transmitter power, based on the premise that all signal 
points are equally probable. For the four-phase signal, we have 


For the two-amplitude, four-phase QAM, we place the points on circles of radii A and 


which is the same average power as the M = 4-phase signal constellation. Hence, for all 
practical purposes, the error rate performance of the two signal sets is the same. In other 
words, there is no advantage of the two-amplitude QAM signal set over M = 4-phase 
modulation. 

Next, let us consider M = 8-QAM. In this case, there are many possible signal 
constellations. We shall consider the four signal constellations shown in Figure 4.3-7, 
all of which consist of two amplitudes and have a minimum distance between signal 
points of 2 A. The coordinates (A mc , A ms ) for each signal point, normalized by A, are 
given in the figure. Assuming that the signal points are equally probable, the average 



(4.3-22) 


s/3 A. Thus, c/ m in = 2A, and 


£ avg = l ~ [2(3 A 2 ) + 2A 2 ] = 2A 2 


(4.3-23) 



FIGURE 4.3-6 


Two four-point signal constellations. 


(a) 


(b) 
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Four eight-point signal constellations. 


transmitted signal energy is 


M 


^ avg _ M ^ ^ mc ^ ms ) 


m= 1 

,2 M 


= -Jj Y, ( a mc + a L) 


m = 1 


(4.3-24) 


where ( a mc , a ms ) are the coordinates of the signal points, normalized by A. 

The two signal sets (a) and (c) in Figure 4.3-7 contain signal points that fall on a 
rectangular grid and have £ avg = 6A 2 . The signal set (b) requires an average transmitted 
energy £ avg = 6. 83 A 2 , and (d) requires £ avg = 4.73 A 2 . Therefore, the fourth signal set 
requires approximately 1 dB less energy than the first two and 1 .6 dB less energy than 
the third, to achieve the same probability of error. This signal constellation is known 
to be the best eight-point QAM constellation because it requires the least power for a 
given minimum distance between signal points. 

For M > 16, there are many more possibilities for selecting the QAM signal points 
in two-dimensional space. For example, we may choose a circular multiamplitude 
constellation for M = 16, as shown in Figure 3.2—4. In this case, the signal points at a 
given amplitude level are phase-rotated by relative to the signal points at adjacent 
amplitude levels. This 1 6-QAM constellation is a generalization of the optimum 8-QAM 
constellation. However, the circular 1 6-QAM constellation is not the best 1 6-point QAM 
signal constellation for the AWGN channel. 

Rectangular QAM signal constellations have the distinct advantage of being easily 
generated as two PAM signals impressed on the in-phase and quadrature carriers. In 
addition, they are easily demodulated. Although they are not the best M - ary QAM 
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signal constellations for M > 16, the average transmitted power required to achieve 
a given minimum distance is only slightly greater than the average power required for 
the best M- ary QAM signal constellation. For these reasons, rectangular M- ary QAM 
signals are most frequently used in practice. 

In the special case where k is even and the constellation is square, it is possible to 
derive an exact expression for the error probability. This particular case was previously 
studied in Section 3.2-3 in Equations 3.2 — 42 to 3.2-44. In particular, the minimum 
distance of this constellation is given by 


rf m i n — 


/ 6 log 2 M 
M - 1 


£ 


bavg 


(4.3-25) 


Note that this constellation can be considered as two -J~M -ary PAM constellations in 
the in-phase and quadrature directions. An error occurs if either n \ or no is large enough 
to cause an error in one of the two PAM signals. The probability of a correct detection 
for this QAM constellation is therefore the product of correct decision probabilities for 
constituent PAM systems, i.e., 

\ 2 


c,M-QAM 


= P 


c, JM - PAM 


= ( 


1 - P 


e,s/~M - PAM , 


resulting in 


e,M-QAM 


= ‘-( 


- 1 - P. 


= 2 P 


e,s/M - PAM I 


1 


e, VM-PAM 


1 2 -^.Vm-pam 


But, from Equation 4.3^1, 


e,y/M - PAM 


= 2 1 - 


1 


Vm 


Q 


i n \ 

V2Ao/ 


in which we need to substitute d m j n from Equation 4.3-25 to obtain 

1 


e.ofM - PAM 


= 2 1 - 


sfM 


Q 


3 log 2 M <?bavg 
M - 1 ~Nq~ 


Substituting Equation 4.3-29 into Equation 4.3-27 yields 


e,M-QAM 


= 4 1- 


1 


■s/M 

i-(i- 


e 


s[M 


1 3 log, M fbavg 
M - 1 No 


Q 


1 3 log 2 M £\ 


bavg 


M — 1 N 0 


<4 Q 


1 3 log, M £bavg 
M - 1 No 


(4.3-26) 


(4.3-27) 


(4.3-28) 


(4.3-29) 


(4.3-30) 
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SNR per bit, y b (dB) 


FIGURE 4.3-8 

Probability of a symbol error for QAM. 


For large M and moderate to high SNR per bit, the upper bound given in Equa- 
tion 4.3-30 is quite tight. Figure 4.3-8 illustrates plots of message error probabil- 
ity of Af-ary QAM as a function of SNR per bit. Although Equation 4.3-30 is obtained 
for square constellations, for large M it gives a good approximation for general QAM 
constellations with M = 2 k points which are either in the shape of a square (when k 
is even) or in the shape of a cross (when k is odd). These types of constellations are 
illustrated in Figure 3.2-5. 

Comparing the error performance of M - ary QAM with M - ary ASK and MPSK 
given in Equations 4.3-5 and 4.3-19, respectively, we observe that unlike PAM and 
PSK signaling in which in the penalty for increasing the rate was 6 dB/bit, in QAM 
this penalty is 3 dB/bit. This shows that QAM is more power efficient compared 
with PAM and PSK. The advantage of PSK is, however, its constant-envelope 
properties. 

example 4.3-1. QPSK can be considered as 4-QAM with a square constellation. 

Using Equation 4.3-30 with M = 4, we obtain 


Pa = 2<2 





(4.3-31) 


< 2 Q 
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which is in agreement wit Equation 4.3-15. For 16-QAM with a rectangular constel- 
lation we obtain 


P\6 = 3 Q 


<3 Q 




(4.3-32) 


For nonrectangular QAM signal constellations, we may upper-bound the error 
probability by use of the union bound as 


P m <(M 



(4.3-33) 


where d min is the minimum Euclidean distance of the constellation. This bound may be 
loose when M is large. In such a case, we may approximate Pm by replacing M — 1 by 
Amin, where /V mm is the largest number of neighboring points that are at distance d mm 
from any constellation point. More discussion on the performance of general QAM 
signaling schemes is given in Section 4.7. 

It is interesting to compare the performance of QAM with that of PSK for any 
given signal size M, since both types of signals are two-dimensional. Recall that by 
Equation 4.3-17, for M - ary PSK, the probability of a symbol error is approximated as 


Pm*2Q 



(4.3-34) 


For M -ary QAM, we may use the expression 4.3-30. Since the error probability is 
dominated by the argument of the Q function, we may simply compare the arguments 
of Q for the two signal formats. Thus, the ratio of these two arguments is 

3 


TZ m = 


M - 1 

2 s in’(i) 


(4.3-35) 


For example, when M = 4, we have TZm = 1- Hence, 4-PSK and 4-QAM yield com- 
parable performance for the same SNR per symbol. This was noted in Example 4.3-1. 
On the other hand, when M > 4, we find that TZm > 1, so that M -ary QAM yields bet- 
ter performance than M - ary PSK. Table 4.3-1 illustrates the SNR advantage of QAM 
over PSK for several values of M. For example, we observe that 32-QAM has a 7-dB 
SNR advantage over 32-PSK. 


TABLE 4.3-1 

SNR Advantage of M - ary 
QAM over M - ary PSK 


M 

10 log TZ m 

8 

1.65 

16 

4.20 

32 

7.02 

64 

9.95 
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4.3^1 Demodulation and Detection 


ASK, PSK, and QAM have one- or two-dimensional constellations with orthonormal 
basis of the form 


for PSK and QAM and 


0t (0 
02 (0 



01 (0 


g(t) cos 2nf c t 


(4.3-36) 


(4.3-37) 


for ASK. The optimal detector in these systems requires filters matched to and 
02(f). Since both the received signal r(t) and the basis functions are high frequency 
bandpass signals, the filtering process, if implemented in software, requires high sam- 
pling rates. 

To alleviate this requirement, we can first demodulate the received signal to obtain 
its lowpass equivalent signal and then perform the detection on this signal. The process 
of demodulation was previously discussed in Section 2.1-2 and the block diagram of 
the demodulator is repeated in Figure 4.3-9. 


cos 2rrf Q t 



(c) 


FIGURE 4.3-9 

Complex (a) and real (b) demodulators. A general representation for a demodulator is shown 
in (c). 
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It is important to note that the demodulation process is an invertible process. We 
have seen in Section 4.1-1 that invertible preprocessing does not affect optimality 
of the receiver. Therefore, the optimal detector designed for the demodulated signal 
performs as well as the optimal detector designed for the bandpass signal. The benefit of 
the demodulator-detector implementation is that in this structure the signal processing 
required for the detection is done on the demodulated lowpass signal, thus reducing the 
complexity of the receiver. 

Recall from Equations 2.1-21 and 2.1-24 that £ x = \£ XI and (x(t),y(t)) = 
\ Re [(xi(t), yi(t))]. From these relations the optimal detection rule 


m = argmax 

1 <m<M 


r • Sr 


No 

H — — In Pm 



can be written in the following lowpass equivalent form 


m = argmax 

1 <m<M 


^Re [r, • s m i\ + No In P m 



(4.3-38) 


(4.3-39) 


or, equivalently, 


m = arg max ( Re 

1 <m<M 


ri{t)s* ml {t)dt 


+ No In P m — I \s ml (t)\ A dt ) (4.3-40) 


The ML detection rule is obviously 

m = arg max f Re / ri(t)s* nl (t) dt 

1 <m<M V _. 2 —oo 


1 /•“ 

2 7-0 


\Sml(t)\~dt 


(4.3-41) 


Equations 4.3-39 to 4.3-41 are baseband detection rules after demodulation. 

The implementation of Equations 4.3-39 to 4.3^11 can be done either in the form 
of a correlation receiver or in the form of matched biters where the matched biters 
are of the form s* ml (T — t) or </>*, ( T — t). Figure 4.3-10 shows the schematic diagram 
for a complex matched biter, and Figure 4.3-1 1 illustrates the detailed structure of a 
complex matched biter in terms of its in-phase and quadrature components. Note that 
for ASK, PSK, and QAM we have s m i(t ) = A m g(t), where A m is in general a complex 
number (real for ASK). Therefore </>i(0 = g(t)/\/£g serves as the basis function, and 
the signal points are represented by complex numbers of the form A m \f£g- Also note 
that for PSK detection the last term in Equation 4.3-41 can be dropped. 

Throughout this discussion we have assumed that the receiver has complete knowl- 
edge of the carrier frequency and phase. This requires full synchronization between the 


n(t) 


S*ml(.T~t ) 



.00 

J ri(i)s m l(t ) dt 

J — 00 


FIGURE 4.3-10 

Complex lowpass equivalent matched filter. 
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t = T 



Re [C, '/©£/© *] 


im [r» r ;© s ^© *] 


FIGURE 4.3-11 

Equivalent lowpass matched filter. 

transmitter and the receiver. In Section 4.5 we will study the case where the carrier 
generated at the receiver is not in phase coherence with the transmitter carrier. 

■ 4.4 

OPTIMAL DETECTION AND ERROR PROBABILITY 
FOR POWER-LIMITED SIGNALING 

Orthogonal, biorthogonal, and simplex signaling is characterized by high dimensional 
constellations. As we will see in this section, these signaling schemes are more power- 
efficient but less bandwidth-efficient than ASK, PSK, and QAM. We begin our study 
with orthogonal signaling and then extend our results to biorthogonal and simplex 
signals. 

4.4-1 Optimal Detection and Error Probability for Orthogonal Signaling 

In an equal-energy orthogonal signaling scheme, N = M and the vector representation 
of the signals is given by 

Si = (V£,o, ...,o) 

s 2 = (o, Vs , . . . , 0) 

SM = ( 0.....0, Vs) 


(4.4-1) 
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For equiprobable, equal-energy orthogonal signals, the optimum detector selects 
the signal resulting in the largest cross-correlation between the received vector r and 
each of the M possible transmitted signal vectors {s„,}, i.e., 

m = arg max r ■ s m (4.4-2) 

1 <m<M 

By symmetry of the constellation and by observing that the distance between any pair of 
signal points in the constellation is equal to V2S, we conclude that the error probability 
is independent of the transmitted signal. Therefore, to evaluate the probability of error, 
we can suppose that the signal sq is transmitted. With this assumption, the received 
signal vector is 

r = (VE + ni,n 2 ,n 3 , ...,n M ) (4.4-3) 

where VS denotes the symbol energy and m, n 2 , . . . , «m are zero-mean, mutually 
statistically independent Gaussian random variables with equal variance rr‘ = 

Let us define random variables R w , 1 < m < M, as 

R m = r ■ s m (4.4—4) 


With this definition and from Equations 4.4-3 and 4.4-1, we have 

R\ = S + VS ri] 

R m = VS n m , 2 < m < M 


(4.4-5) 


Since we are assuming that .v | was transmitted, the detector makes a correct decision 
if Ri > R m for m = 2, 3, . . . , M. Therefore, the probability of a correct decision is 
given by 


P c = P > R 2 , R\ > 7 ? 3 , . . . , R i > R m |sj sent] 


= P 


VE + n\ > n.2, VS + n\ > n$, . . . , VS + n x > 


n M 


Si sent 


(4.4-6) 


Events VS +n\ > n 2 , VS +n x > n 2 , , VS +n \ > n M are not independent due 
to the existence of the random variable n\ in all of them. We can, however, condition 
on n x to make these events independent. Therefore, we have 


Pr = 


n 2 < n 


+ VS, n 2 < n + VS, . . . , n M < n + Vs 


sent, «i = n 


p ni (n ) dn 


f (' 


n 2 < 


n + VS 


\ M-l 


Si sent, 77 1 = n 


p n ,(n) dn 


(4.4-7) 

where in the last step we have used the fact that n m ’ s are iid random variables for 
m = 2, 3, . . . , M. We have 


n 2 


<77 + VS 


Si sent, 77 1 = 77 


= 1 - Q 



(4.4-8) 
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Hence, 


Pr = 


ly/llNo 


1 - Q 


1 1 + j£ 


M~ 1 


e dn 


(4.4-9) 


and 


P,. = 1 - Pr = 


-J2n J-c 


[1 - (1 - Q{x)) u ~ l ] e 


dx 


(4.4-10) 


where we have introduced a new variable x = n+ rJ- ■ In general, Equation 4.4-1 0 cannot 


«0 
V 2 

be made simpler, and the error probability can be found numerically for different values 
of the SNR. 

In orthogonal signaling, due to the symmetry of the constellation, the probabilities 
of receiving any of the messages m = 2, 3, . . . , M, when s \ is transmitted, are equal. 
Therefore, for any 2 < m < M, 


P [s m received |s j sent] = 


M — 1 2 k — 1 


(4.4-11) 


Let us assume that s i corresponds to a data sequence of length k with a 0 at the first 
component. The probability of an error at this component is the probability of detecting 
an s m corresponding to a sequence with a 1 at the first component. Since there are 2 k 1 
such sequences, we have 


P b 


2 k-i 


Pe 

2 k — 1 


2 k-i 
2 k - 1 


Pe 





(4.4-12) 


where the last approximation is valid for ( > 1. 

The graphs of the probability of a binary digit error as a function of the SNR per 
bit, £b/No, are shown in Figure 4.4-1 for M = 2,4, 8, 16, 32, and 64. This figure 
illustrates that, by increasing the number M of waveforms, one can reduce the SNR 
per bit required to achieve a given probability of a bit error. For example, to achieve 
P b = 1 0 5 . the required SNR per bit is a little more than 12 dB for M = 2; but if M 
is increased to 64 signal waveforms (k = 6 bits per symbol), the required SNR per 
bit is approximately 6 dB. Thus, a savings of over 6 dB (a factor-of-4 reduction) is 
realized in transmitter power (or energy) required to achieve Pb = 10 5 by increasing 
M from M = 2 to M = 64. This property is in direct contrast with the performance 
characteristics of ASK, PSK, and QAM signaling, for which increasing M increases 
the required power to achieve a given error probability. 


Error Probability in FSK Signaling 

From Equation 3.2-58 and the discussion following it, we have seen that FSK signaling 
becomes a special case of orthogonal signaling when the frequency separation A / is 
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FIGURE 4.4-1 

Probability of bit error for orthogonal signaling. 



SNR per bit, y b (dB) 


given by 


A / = ^ (4.4-13) 

for a positive integer l. For this value of frequency separation the error probability of 
M - ary FSK is given by Equation 4.4-10. 

Note that in the binary FSK signaling, a frequency separation that guarantees 
orthogonality does not minimize the error probability. In Problem 4. 1 8 it is shown that 
the error probability of binary FSK is minimized when the frequency separation is of 
the form 


A / = 


0.715 

T 


(4.4-14) 


A Union Bound on the Probability of Error in Orthogonal Signaling 

The union bound derived in Section 4.2-3 states that 


Pe < 


M - 1 


■ (? 4Aq 


(4.4-15) 


In orthogonal signaling <i m j n = \fl £ , therefore, 


M - 1 

Pe < — ; — e 


Me~ 


2 


(4.4-16) 
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Using M = 2 k and Sb = £/k, we have 



(4.4-17) 


It is clear from Equation 4.4-17 that if 


— > 2 In 2 = 1.39 ~ 1.42 dB 
N 0 


(4.4-18) 


then P e —>■ oo as k —> oo. In other words, if the SNR per bit exceeds 1.42 dB, then 
reliable communication t is possible. 

One can ask whether the condition SNR per bit > 1 .42 dB is necessary, as well as 
being sufficient, for reliable communication. We will see in Chapter 6 that this condition 
is not necessary. We will show there that a necessary and sufficient condition for reliable 
communication is 


Thus, reliable communication at SNR per bit lower than — 1.6 dB is impossible. The 
reason that Equation 4.4-17 does not result in this tighter bound is that the union bound 
is not tight enough at low SNRs. To obtain the —1.6 dB bound, more sophisticated 
bounding techniques are required. By using these bounding techniques it can be shown 
that 


The minimum value of SNR per bit needed for reliable communication, i.e., — 1 .6 dB, 
is called the Shannon limit. We will discuss this topic and the notion of channel capacity 
in greater detail in Chapter 6. 


4.4-2 Optimal Detection and Error Probability for Biorthogonal Signaling 

As indicated in Section 3.2^4, a set of M = 2 k biorthogonal signals is constructed 
from \M orthogonal signals by including the negatives of the orthogonal signals. 
Thus, we achieve a reduction in the complexity of the demodulator for the biorthogonal 
signals relative to that for orthogonal signals, since the former is implemented with 
\M cross-correlators or matched filters, whereas the latter requires M matched filters, 
or cross-correlators. In biorthogonal signaling N = \M, and the vector representation 



(4.4-19) 



In 2 < #■ < 4 In 2 


|>41„2 


(4.4-20) 


tWe say reliable communication is possible if we can make the error probability as small as desired. 
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for signals are given by 

Si = -SN+ i = (Vf,0 , 0 ) 

s 2 = — ■ SjV+2 = ( 0 , \/£ , ■ ■ ■ , 0 ) 

(4.4-21) 


Sn = —S2N = (0, . . . , 0, y/~£) 

To evaluate the probability of error for the optimum detector, let us assume that the 
signal s\(t ) corresponding to the vector si = ( \p£ . 0, . . . , 0) was transmitted. Then the 
received signal vector is 

r = (y/£ + m , n 2 , . . . , n N ) (4.4-22) 


where the {«,„} are zero-mean, mutually statistically independent and identically dis- 
tributed Gaussian random variables with variance <r 2 = \Nq. Since all signals are 
equiprobable and have equal energy, the optimum detector decides in favor of the 
signal corresponding to the largest in magnitude of the cross-correlators 

C(r, s m ) = r ■ s m , 1 <m< (4.4-23) 

while the sign of this largest term is used to decide whether s m (t) or — s m (t ) was 
transmitted. According to this decision rule, the probability of a correct decision is 
equal to the probability that r\ = \[S + m >0 and r\ exceeds \r m \ = \n m \ for 
m = 2, 3, . . . , \M. But 


P[|«ml < r\ In > 0] = 


^nNo J- ri 


e~ x2/No dx 


_ 1 f dW? 

y/2n J — ir 

Then the probability of a correct decision is 


n 0 r- -4 , 

e 2 dx 


(4.4-24) 




M/ 2-1 


Pr = 


y/2n J - 


* 0/2 -4 J 

e 2 dx 


p( r i) dr i 


/*o/2 


from which, upon substitution for p(r \ ), we obtain 

1 r°° ( 1 


■v+d2£jm x 2 \ M/2 1 

P r = , — / ( , — / e~ ^ T dx I 

V 27 T J-JIS/No \y/ 2 n J -( v+s/ 2 £/No ) ) 


dv 


(4.4-25) 


(4.4-26) 


where we have used the PDF of n as a Gaussian random variable with mean equal to 
yf£ and variance \Nq. Finally, the probability of a symbol error P e = 1 — P c . P c , and 
hence, P e may be evaluated numerically for different values of M from Equation 4.4-26. 
The graph shown in Figure 4.4-2 illustrates P e as a function of £b/No, where £ = k£b, 
for M = 2,4.8, 1 6, and 32. We observe that this graph is similar to that for orthogonal 
signals (see Figure 4.4—1). Flowever, in this case, the probability of error for M — A 
is greater than that for M = 2. This is due to the fact that we have plotted the symbol 
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SNR per bit, y h (dB) 


FIGURE 4.4-2 

Probability of symbol error for biorthogonal 
signals. 


error probability P e in Figure 4.4-2. If we plotted the equivalent bit error probability, 
we should find that the graphs for M = 2 and M = 4 coincide. As in the case of 
orthogonal signals, as M — > oo (or k — > oo), the minimum required £b/No to achieve 
an arbitrarily small probability of error is —1.6 dB, the Shannon limit. 


4.4-3 Optimal Detection and Error Probability for Simplex Signaling 


As we have seen in Section 3.2^1, simplex signals are obtained from a set of orthogonal 
signals by shifting each signal by the average of the orthogonal signals. Since the signals 
of an orthogonal signal are simply shifted by a constant vector to obtain the simplex 
signals, the geometry of the simplex signal, i.e., the distance between signals and the 
angle between lines joining signals, is exactly the same as that of the original orthogonal 
signals. Therefore, the error probability of a set of simplex signals is given by the same 
expression as the expression derived for orthogonal signals. However, since simplex 
signals have a lower energy, as indicated by Equation 3.2-65 the energy in the expression 
for error probability should be scaled accordingly. Therefore the expression for the error 
probability in simplex signaling becomes 


P e = l-P c = 



<2W) m_1 ] 


e 2 dx 


(4.4-27) 


This indicates a relative gain of 10 log over orthogonal signaling. For M = 2, this 
gain becomes 3 dB; for M = 10 it reduces to 0.46 dB; and as M becomes larger, it 
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becomes negligible and the performance of orthogonal and simplex signals becomes 
similar. Obviously, for simplex signals, similar to orthogonal and biorthogonal signals, 
the error probability decreases as M increases. 


■ 4.5 

OPTIMAL DETECTION IN PRESENCE OF UNCERTAINTY: 

NONCOHERENT DETECTION 

In the detection schemes we have studied so far, we made the implicit assumption that 
the signals {s m (t), 1 < m < M) are available at the receiver. This assumption was in 
the form of either the availability of the signals themselves or the availability of an 
orthonormal basis {</>j(t), 1 < j < A}. Although in many communication systems this 
assumption is valid, there are many cases in which we cannot make such an assumption. 

One of the cases in which such an assumption is invalid occurs when transmission 
over the channel introduces random changes to the signal as either a random attenuation 
or a random phase shift. These situations will be studied in detail in Chapter 13. Another 
situation that results in imperfect knowledge of the signals at the receiver arises when the 
transmitter and the receiver are not perfectly synchronized. In this case, although the 
receiver knows the general shape of {s m (t)}> due to imperfect synchronization with 
the transmitter, it can use only signals in the form of [s m (t — td ) } , where tj represents 
the time slip between the transmitter and the receiver clocks. This time slip can be 
modeled as a random variable. 

To study the effect of random parameters of this type on the optimal receiver 
design and performance, we consider the transmission of a set of signals over the 
AWGN channel with some random parameter denoted by the random vector 0. We 
assume that signals { s m (t ), 1 < m < M] are transmitted, and the received signal r(t) 
can be written as 


r(t) = s m (f, 0) + n(t) (4.5-1) 

where 0 is in general a vector-valued random variable. By the Karhunen-Loeve expan- 
sion theorem discussed in Section 2.8-2, we can find an orthonormal basis for expansion 
of the random process s m (t; 6) and by Example 2.8-1, the same orthonormal basis can 
be used for expansion of the white Gaussian noise process n(t). By using this basis, the 
waveform channel given in Equation 4.5-1 becomes equivalent to the vector channel 

r = s m fi + n (4-5-2) 

for which the optimal detection rule is given by 

m = argmax P m p(r\m) 

1 <m<M 

= argmax P m 

1 <m<M 

= argmax P m I p n (r - s m ,»)p(fi)d0 
1 <m<M J 


p(r\m, 6)p(0 ) d6 


(4.5-3) 
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Equation 4.5-3 represents the optimal decision rule and the resulting decision 
regions. The minimum error probability, when the optimal detection rule of Equa- 
tion 4.5-3 is employed, is given by 


M 


p e = Y. p ' 

m= 1 

M M 

= E p » E 


p(r\m,0)p(9)dO J dr 

P„(r ~ s m j)p(0)d0 ) dr 


771=1 771 '=1 ' 

m'^m 


(4.5-4) 


Equations 4.5-3 and 4.5-4 are quite general and can be used for all types of uncertainties 
in channel parameters. 

example 4.5-1. A binary antipodal signaling system with equiprobable signals 
.v i ( r ) = sit) and S 2 U) = — s(t ) is used on an AWGN channel with noise power spectral 
density of ^ . The channel introduces a random gain of A which can take only non- 
negative values. In other words the channel does not invert the polarity of the signal. 
This channel can be modeled as 


r(t) = A s m (t) + n(t) 


(4.5-5) 


where A is a random gain with PDF p(A) and p(A) = 0 for A < 0. Using Equa- 
tion 4.5-3, and noting that p(r \m. A) = p n (r — As m ), i)\ . the optimal decision region 
for si(t) is given by 


A 


f (r -Aj £- b ) 2 

y ■ J e N ° p(A)dA > 


f°° (r+A^f 

e N ° p(A)dA 


which simplifies to 


D\ 



( IrA^fej 2rAj£^ 

e N » — e N « 


p(A)dA > 0 


(4.5-6) 


(4.5-7) 


Since A takes only positive values, the expression inside the parentheses is positive if 
and only if r > 0. Therefore, 


D\ = {r : r > 0} 

To compute the error probability, we have 

r ° 0 / r°° j (r+A^y 1 


Pb = 


/o \J 0 


s/ttNq 


e N o dr \ p(A)dA 


= E 


MVt, 

»(■(!) 


(4.5-8) 


p(A)dA 


(4.5-9) 
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where the expectation is taken with respect to A. For instance, if A takes values | and 
1 with equal probability, then 



It is important to note that in this case the average received energy per bit is £bavg = 
\£b + \{\£b) = | £b- In Problem 6.29 we show that P/, > Q ( \J j ■ 


4.5-1 Noncoherent Detection of Carrier Modulated Signals 

For carrier modulated signals, 1 <m< M} are bandpass signals with lowpass 

equivalents {s m i(t), 1 < m < M] where 

s»(0 = Re \smi{t)e ]2nf ‘ 4 } (4.5-10) 

The AWGN channel model in general is given by 

r(t) = s m (t - t d ) + n(t ) (4.5-1 1) 

where t d indicates the random time asynchronism between the clocks of the transmitter 
and the receiver. It is clearly seen that the received random process r(t ) is a function of 
three random phenomena, the message in, which is selected with probability P m , the 
random variable t d , and finally the random process n(t). 

From Equations 4.5-10 and 4.5-1 1 we have 

r(t) = Re [s,„i(t - t d )e ]2nfc(t ~ td) ] + n(t ) 

(4.5-12) 

= Re [s m i(t - t d )e J 27T fctd e j27Tf c r^ 

Therefore, the lowpass equivalent of s m (t — t d ) is equal to ,v m /(t — t d )e~ 2ln ^ c,d . In practice 
t d <<C T s , where T s is the symbol duration. This means that the effect of a time shift of 
size t d on ,v,„/(f ) is negligible. However, the term e^ j27T ^' J can introduce a large phase 
shift 0 = —2ic f c t d because even small values of t d are multiplied by large carrier 
frequency f c , resulting in noticeable phase shifts. Since t d is random and even small 
values of t d can cause large phase shifts that are folded modulo 2n , we can model 0 as 
a random variable uniformly distributed between 0 and 2jt . This model of the channel 
and detection of signals under this assumption is called noncoherent detection. 

From this discussion we conclude that in the noncoherent case 

Re [n(t)e j27lfct ] = Re [(e^ s mt {t) + m(t )) e ]2nfct ] (4.5-13) 

or, in the baseband 

r,(f) = e i4> s m i{t) + m(t) (4.5-14) 

Note that by the discussion following Equation 2.9-14, the lowpass noise process u/(t) 
is circular and its statistics are independent of any rotation; hence we can ignore the 
effect of phase rotation on the noise component. For the phase coherent case where 
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the receiver knows 0, it can compensate for it, and the lowpass equivalent channel will 
have the familiar form of 


n(t) = s m i(t ) + ni(t) (4.5-15) 

In the noncoherent case, the vector equivalent of Equation 4.5-15 is given by 

ri = e i4, s m i + ni (4.5-16) 


To design the optimal detector for the baseband vector channel of Equation 4.5-16, 
we use the general formulation of the optimal detector given in Equation 4.5-3 as 

P f 2n 

m = argmax / p n ,(ri - e i4, s m i)d(j) (4.5-17) 

1 <m<M J 0 


From Example 2.9-1 it is seen that «/(f) is a complex baseband random process with 
power spectral density of 2 Mi in the [-W, W] frequency band. The projections of this 
process on an orthonormal basis will have complex iid zero-mean Gaussian components 
with variance 2 Mi (variance No per real and imaginary components). Therefore we can 
write 


m = argmax 

1 <m<M 


Pm 1_ 

2n (4 jtNo) n 



I 1 2 * 

e 4 «o 


dcj) 


(4.5-18) 


Expanding the exponent, dropping terms that do not depend on m, and noting that 
||s m ,|| 2 = 2£ m , we obtain 


m = arg max 

1 <m<M 

= arg max 

\<m<M 

= arg max 

1 <m<M 

= arg max 

1 <m<M 


Pm 
2i r 

P,n 
2: x 

p 

1 m 

2 7T 

Pm 

2jt 


£ m 
e 2A1q 


£m 

e 1N o 


£ m 
e 2N ° 


£m 

e 2N o 



d< j, 

e ^Re[(n. Sm ,V-^] d(j> 
g2So R e[l'-rSm/l«- y< ^ 9) ] 
e 2^k/s mi |cos(0-e) d . 


(4.5-19) 


where 9 denotes the phase of r\ ■ s m i. Note that the integrand in Equation 4.5-19 is a 
periodic function of 0 with period 2n, and we are integrating over a complete period; 
therefore 9 has no effect on the result. Using the relation 

1 f 2ir 

/„(*)=— / e x cos< ^ dcj) (4.5-20) 

2n Jo 

where U)(x) is the modified Bessel function of the first kind and order zero, we obtain 

_£m_ f\ri-s m i 
m = arg max P m e 1N ° 7 0 

1 <m<M \ 21Vo 


(4.5-21) 
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In general, the decision rule given in Equation 4.5-21 cannot be made simpler. 
However, in the case of equiprobable and equal-energy signals, the terms P m and £ m 
can be ignored, and the optimal detection rule becomes 

, (\rrs mi \\ 

m = arg max I 0 — — — (4.5-22) 

1 <m<M \ 27V 0 J 


Since for x > 0, Iq(x) is an increasing function of x, the decision rule in this case 
reduces to 


m = arg max \r t ■ s m /\ 

1 <m<M 


(4.5-23) 


From Equation 4.5-23 it is clear that an optimal noncoherent detector first demod- 
ulates the received signal, using its nonsynchronized local oscillator, to obtain r/(t), the 
lowpass equivalent of the received signal. It then correlates r/(t) with all s m i(t)’s and 
chooses the one that has the maximum absolute value, or envelope. This detector is 
called an envelope detector. Note that Equation 4.5-23 can also be written as 


m = arg max 

1 <m<M 


n(t)s*n,(t)dt 


(4.5-24) 


The block diagram of an envelope detector is shown in Figure 4.5-1 . Detailed block 
diagrams for the demodulator and the complex matched biters shown in this hgure are 
given in Figures 4.3-9 and 4.3-1 1, respectively. 


I = T 



FIGURE 4.5-1 

Block diagram of an envelope detector. 
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4.5-2 Optimal Noncoherent Detection of FSK Modulated Signals 


Forequiprobable FSK signaling, the signals have equal energy and the optimal detection 
rule is given by Equation 4.5-23. Assuming that frequency separation between signals 
is A/, the FSK signals have the general form 

s m (t ) = g(t) cos (2nf c t + 2n(m — 1)A ft) 

(4 5-25) 

= Re [g(0 , 1 < m <M 

Flence, 

= g(t)e j2 ^ m ~ mft (4.5-26) 


where g(t) is a rectangular pulse of duration T s and £ g = 2£ s , where £ s denotes 
the energy per transmitted symbol. At the receiver, the optimal noncoherent detector 
correlates rft) with s m n(t ) for all 1 < m' < M. Assuming s m (t) is transmitted, from 
Equation 4.5-24 we have 


rft)s* n ,ft)dt 


+ n t (t)) s*, z (0 dt 

) 

/ OO 

ni{t)s* mll {t) dt 

-OO 


But 


s ml (t)s* m , l (t)dt= 2 ^ r e J^m-l)Aft e -j2^ m '-l)Aft 
I 1 s Jo 

2£ r T ° 

= —L / e i )Aft j 

Ts Jo 


2£s 


1 


j2n(m-m')AfT s 


T s j2n(m — m')Af - 
= 2£ s e j 7I )' n ~ m )A -f rj sinc [(m — m’)AfT s ] 


- 1 


(4.5-27) 


(4.5-28) 


From Equation 4.5-28 we see that if and only if A / = Jr for some integer k, then 
( s m /(t ), s m 'i{t )) = 0 for all in' ^ in. This is the condition of orthogonality for FSK 
signals under noncoherent detection. For coherent detection, however, the detector 
uses Equation 4.3-41, and for orthogonality we must have Re \{s m i(t), s m 'i(t))] = 0. 
But from Equation 3.2-58 


Re 


Sm/(04'/W dt 


2£ s cos ( n(m — m')AfT s ) sine [(m — m')AfT s ] 
2£ s sine [2 (in — m')AfT s ] 


(4.5-29) 


Obviously, the condition for orthogonality in this case is A/ = It is clear from the 
above discussion that orthogonality under noncoherent detection guarantees orthogo- 
nality under coherent detection, but not vice versa. 
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The optimal noncoherent detection rule for FSK signaling follows the general rule 
for noncoherent detection of equiprobable and equal-energy signals and is implemented 
using an envelope or a square-law detector. 


4.5-3 Error Probability of Orthogonal Signaling with Noncoherent Detection 


Let us assume M equiprobable, equal-energy, carrier modulated orthogonal signals are 
transmitted over an AWGN channel. These signals are noncoherently demodulated at 
the receiver and and then optimally detected. For instance, in coherent detection of 
orthogonal FSK signals we encounter a situation like this. The lowpass equivalent of 
the signals can be written as M N -dimensional vectors (N = M) 


SU = (y/lEs, 0 , 0 , . . . , 0 ) 

S21 = (o, y/ 2 £ s , 0, . . . , 0) 


Smi = ( 0 , 0 , . . . , 0 , y/2£ s ) 


Because of the symmetry of the constellation, without loss of generality we can 
assume that s u is transmitted. Therefore, the received vector will be 

n = e j<t, s\i+ni (4.5-31) 


where w/ is a complex circular zero-mean Gaussian random vector with variance of each 
complex component equal to 2Nq (this follows from the result of Example 2.9-1). The 
optimal receiver computes and compares | r\ ■ s m i |, for all 1 <m< M. This results in 


\r, ■ s u | = \2£ s e ]4> +n, • Si/I 

\r, ■ s m i\ = \ni ■ s mt \, 2 < m < M 


(4.5-32) 


For 1 < m < M, tti ■ s m / is a circular zero-mean complex Gaussian random variable 
with variance 4£ s Nq (2£ s Nq per real and imaginary parts). From Equation 4.5-32 it is 
seen that 

Re [r, ■ si,] ~ N(2£ s cos </>, 2£ S N 0 ) 


Im [r, ■ s j/] ~ J£(2£ s sin 0, 2£ S N 0 ) 

Re [ri ■ s mt ] ~ Af(0, 2 £ S N 0 ), 2 < m < M 


(4.5-33) 


Im [r/ ■ s ml ] ~ Af(0, 2 £ S N 0 ), 2 < m < M 

From the definition of Rayleigh and Ricean random variables given Chapter 2 in 
Equations 2.3-42 and 2.3-55, we conclude that random variables R m , 1 < m < M, 
defined as 


— |C Sml \ , 


1 < m < M 


(4.5-34) 
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are independent random variables, R\ has a Ricean distribution with parameters s = 2£ s 
and a 2 = 2£ S N 0 , and R m , 2 < m < M, are Rayleigh random variables^ with parameter 
a 2 = 2£ s N q . In other words. 


and 


PrStO 


> ( 7 O e ^ r i > 0 

0 otherwise 


(4.5-35) 


PR m (r m ) = 



r m > 0 
otherwise 


(4.5-36) 


for 2 < m < M. Since by assumption su is transmitted, a correct decision is made at the 
receiver if Ri > R m for 2 < tn < M. Although random variables R m for 1 < m < M 
are statistically independent, the events R\ > Ri, R\ > Rj, . . . , R\ > Rm are not 
independent due to the existence of the common R\. To make them independent, we 
need to condition on R\ = r\ and then average over all values of r\. Therefore, 


Pc = P [Ri < Ri, Ri < R Rm < Ri] 


P[R 2 < r u R 3 < r u . . . , R m < G |Ri = r, ]p Rl (r 1 )dr i 


roo 

= / (P [R 2 <r i ]) M ~ 1 p Rl (ri)dri 
Jo 


But 


(4.5-37) 


P[R 2 <n]= / pihin) dr 2 
Jo 

r 2 

= 1 -e~^ 

Using the binomial expansion, we have 

/ ,2 \ M ~ 1 M - 1 

1 - e~£ 


= S ( > ( - r 


Substituting into Equation 4.5-37, we obtain 


M - 1 


P r = ^ (-1)'' 

77=0 
M— 1 

= E<-U* 


77=0 

M—\ 


'M- 1 

v n , 
( M- 1 N 
n 


I e ~ 2 ^o ) e 2 - 2 dr ] 


r l , (sri\ _ (,,+1) T +i 
h — v e 2 « 2 dr i 

I o cr 2 \a z 






"TT* dr\ 


(4.5-38) 


(4.5-39) 


(4.5-40) 


tTo be more precise, we have to note that <p is itself a uniform random variable; therefore to obtain the PDF 
of R,„, we need to first condition on <p and then average with respect to the unifonn PDF. This, however, 
does not change the final result stated above. 
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By introducing a change of variables 


\fn~- j-T 
r' = r\+Jn + 1 

the integral in Equation 4.5^40 becomes 


(4.5-41) 


r t 


/ o cr- 


jri 


'-f+m 


— 7q — r e 2,72 dr i = 


1 


n + 1 Jo a 2 

1 


r I J \ _ a , 

— 7n — t- e 2 » 2 dr 


(4.5-42) 


n + 1 


where in the last step we used the fact that the area under a Ricean PDF is equal to 1 . 

_ 4£ 2 _ 

2cr 2 4£ s jV 0 V 0 ’ 

M- 1 


2 4^2 c* 

Substituting Equation 4.5-42 into Equation 4.5-40 and noting that ^ ~ — — — 

we obtain 


v- (- 1 )" M-\\ 

P c = r I I e " +1 «0 


,1=0 


n + 1 


n 


Then the probability of a symbol error becomes 


M-l 


r-iy+W M - l\ e„ 
P e = — I I e n+I N ° 


n= 1 


7Z + 1 


(4.5^13) 


(4.5-44) 


For binary orthogonal signaling, including binary orthogonal FSK with noncoher- 
ent detection, Equation 4.5-44 simplifies to 


1 £ i> 

p, = ~ e 2 "o 


(4.5-45) 


Comparing this result with coherent detection of binary orthogonal signals for which 
the error probability is given by 


P b = Q 


(4.5-46) 


and using the inequality Q(x) < \e * 2 / 2 , we conclude that Bftnoncoh > P/, co h , as ex- 
pected. For error probabilities less than 10 4 , the difference between the performance 
of coherent and noncoherent detection of binary orthogonal is less than 0.8 dB. 

For M > 2, we may compute the probability of a bit error by making use of the 
relationship 

p b = ^jPe (4-5-47) 

which was established in Section 4.4-1. Figure 4.5-2 shows the bit error probability 
as a function of the SNR per bit Yb for M = 2, 4, 8, 16, and 32. Just as in the case 
of coherent detection of M - ary orthogonal signals (see Figure 4.4-1), we observe that 
for any given bit error probability, the SNR per bit decreases as M increases. It will 
be shown in Chapter 6 that, in the limit as M — > oo (or k = log, M — > oo), the 
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SNR per bit, y b (dB) 


FIGURE 4.5-2 

Probability of a bit error for noncoherent 
detection of orthogonal signals. 


probability of a bit error P/, can be made arbitrarily small provided that the SNR per 
bit is greater than the Shannon limit of —1.6 dB. The cost for increasing M is the 
bandwidth required to transmit the signals. For M - ary FSK, the frequency separation 
between adjacent frequencies is A / = 1 /T s for signal orthogonality. The bandwidth 
required for the M signals is IT = M A / = M/ T s . 


4.5-4 Probability of Error for Envelope Detection 
of Correlated Binary Signals 

In this section, we consider the performance of the envelope detector for binary, 
equiprobable, and equal-energy correlated signals. When the two signals are corre- 
lated, we have 

Smi ■ Sm'i = { s m , m' = 1,2 (4.5-48) 

12 t s p m J=- m 

where p is the complex correlation between the lowpass equivalent signals. The detector 
bases its decision on the envelopes | r / • s i / 1 and | r / ■ s 21 1 , which are correlated (statistic ally 
dependent). Assuming that ,s'i (f) is transmitted, these envelopes are given by 

R\ = \ri ■ si/| = \2£ s e J4> + ni ■ si/| 

Ri = \ri ■ s 2 i\ = \2£ s pe J(p + «/ ■ s 2 i\ 


(4.5-49) 
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We note that since we are interested in the magnitudes of 2£ s e^+ni-su and 2£ s pe^ + 
tii ■ S 21 , the effect of e^’ can be absorbed in the noise component which is circular, and 
such a phase rotation would not affect its statistics. From above it is seen that R\ is 
a Ricean random variable with parameters ,v ] = 2£ s and a 2 = 2£ s Nq, and R 2 is a 
Ricean random variable with parameters S 2 = 2£ s \p\ and 02 = 2£ s Nq. These two 
random variables are dependent since the signals are not orthogonal and hence noise 
projections are statistically dependent. 

Since Ri and R 2 are statistically dependent, the probability of error may be obtained 
by evaluating the double integral 


P h = P{R 2 >R\) = / / p{x\,x 2 )dx\dx 2 (4.5-50) 

JO J x 1 

where p(x 1 , x 2 ) is the joint PDF of the envelopes R\ and R 2 . This approach was first 
used by Helstrom (1955), who determined the joint PDF of R\ and R 2 and evaluated 
the double integral in Equation 4.5-50. 

An alternative approach is based on the observation that the probability of error 
may also be expressed as 

P b = P(R 2 > Ri) = P{R\ > R?) = P(R 2 ~Xf>0) (4.5-51) 


But R\ — R\ is a special case of a general quadratic form in complex- valued Gaussian 
random variables, treated later in Appendix B. For the special case under consideration, 
the derivation yields the error probability in the form 

P b = Q\(a,b)- 1 - e~^I 0 (ab) (4.5-52) 

where 


a = 


b = 



(4.5-53) 


and Q\(a, b) is the Marcum Q function defined in Equations 2.3-37 and 2.3-38 and 
Iq(x) is the modified Bessel function of order zero. Substituting Equation 4.5-53 into 
Equation 4.5-52 yields 

P b = Qda,b)-^e^I 0 (Jj^\p\') (4.5-54) 


The error probability P b is illustrated in Figure 4.5-3 for several values of |p|; 
P b is minimized when p = 0, that is, when the signals are orthogonal. For this case, 
a = 0,b = \/£i,/N(), and Equation 4.5-54 reduces to 


Pb = Gi 



1 

2 


e -S b /2N 0 


(4.5-55) 
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FIGURE 4.5-3 

Probability of error for noncoherent detection of binary FSK. 

From the properties of Q\(a,b) in Equation 2.3-39, it follows that 


ei l°’vt' =e ^° 


(4.5-56) 


Substitution of these relations into Equation 4.5-54 yields the desired result given 
previously in Equation 4.5-45. On the other hand, when \p\ = 1, a = b = \J ^ and 

by using Equation 2.3-38 the error probability in Equation 4.5-52 becomes P/, = 
as expected. 


4.5-5 Differential PSK (DPSK) 

We have seen in Section 4.3-2 that in order to compensate for phase ambiguity of 
which is a result of carrier tracking by phase-locked loops (PLLs), differentially 
encoded PSK is used. In differentially encoded PSK, the information sequence deter- 
mines the relative phase, or phase transition, between adjacent symbol intervals. Since 
in differential PSK the information is in the phase transitions and not in the absolute 
phase, the phase ambiguity from a PLL cancels between the two adjacent intervals and 
will have no effect on the performance of the system. The performance of the system 
is only slightly degraded due to the tendency of errors to occur in pairs, and the overall 
error probability is twice the error probability of a PSK system. 

A differentially encoded phase-modulated signal also allows another type of de- 
modulation that does not require the estimation of the carrier phase. Therefore, this type 
of demodulation/detection of differentially encoded PSK is classified as noncoherent 
detection. Since the information is in the phase transition, we have to do the detection 
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over a period of two symbols. The vector representation of the lowpass equivalent of 
the 77ith signal over a period of two symbol intervals is given by 

s m , = (\flEs y/2£,e i0m ), 1 <m <M (4.5-57) 

where 9 m = 2jr( j^~ 1) is the phase transition corresponding to the 777th message. When 
s m i is transmitted, the vector representation of the lowpass equivalent of the received 
signal on the corresponding two-symbol period is given by 

ri = (ri 7- 2 ) = (\/2£ s \/2£^e jBm )e J ^ + {n\i n 2l ) , \<m<M (4.5-58) 

where n\i and n 2 i are two complex- valued, zero-mean, circular Gaussian random 
variables each with variance 2 Mi (variance No for real and imaginary components) 
and 0 is the random phase due to noncoherent detection. The key assumption in this 
demodulation-detection scheme is that the phase offset 0 remains the same over ad- 
jacent signaling periods. The optimal noncoherent receiver uses Equation 4.5-22 for 
optimal detection. We have 

777 = arg max |r, ■ s m/ | 

1 <m<M 

= argmax \J2E S |t*i + r 2 e~^ m | 

1 <m<M 

= arg max | ri + r 2 e~^ m |~ 

1 <m<M 

= argmax (ir^ 2 + [r 2 | 2 + 2Re [r*r 2 e _J0 '"]) 

1 <m<M _ . (4.5-59) 

= arg max Re \r*r 2 e ; '”] 

1 <m<M 

= argmax \r x r 2 \ cos (Zr 2 - Lr x - 9 m ) 

\<m<M 

= argmax cos(/r 2 — Lr\ — 9 m ) 

1 <m<M 

= arg min |/r 2 - Lr x - 9 m \ 

1 <m<M 

Note that a = Lr 2 — Lr 1 is the phase difference of the received signal in two adjacent 
intervals. The receiver computes this phase difference and compares it with 9 m = 
= i ^(?77 — 1) for all I < 7 ?; < M and selects the m for which 9 m is closest to a, thus 
maximizing cos(o; — 9 m ). A differentially encoded PSK signal that uses this method for 
demodulation detection is called differential PSK (DPSK). This method of detection 
has lower complexity in comparison with coherent detection of PSK signals and can 
be used in situations where the assumption that 0 remains constant over two-symbol 
intervals is valid. As we see below, there is a performance penalty in employing this 
detection method. 

The block diagram for the DPSK receiver is illustrated in Figure 4.5-4. In this 
block diagram g(t) represents the baseband pulse used for phase modulation, T s is the 
symbol interval, the block with the Z symbol is a phase detector, and the block with T s 
introduces a delay equal to the symbol interval T s . 

Performance of Binary DPSK In binary DPSK the phase difference between 
adjacent symbols is either 0 or 7T, corresponding to a 0 or 1 . The two lowpass equivalent 
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FIGURE 4.5-4 

The DPSK receiver. 


signals are 


Sll = (y/2£l *J2£ S ) 

S21 = (>/227 - ■J2£ s ) 


(4.5-60) 


These two signals are noncoherently demodulated and detected using the general ap- 
proach for optimal noncoherent detection. It is clear that the two signals are orthogonal 
on an interval of length 2 T s . Therefore, the error probability can be obtained from 
the expression for the error probability of binary orthogonal signaling given in Equa- 
tion 4.5-45. The difference is that the energy in each of the signals si(t) and 52 (f) is 
2£ s . This is seen easily from Equation 4.5-60 which shows that the energy in lowpass 
equivalents is \£ s . Therefore, 

1 2 £ s 

P h = -e “o 

7 £ (4-5-61) 

= - e N o 

2 

This is the bit error probability for binary DPSK. Comparing this result with coherent 
detection of BPSK where the error probability is given by 



P b = Q 


(4.5-62) 
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SNR per bit, y b (dB) 


FIGURE 4.5-5 

Probability of error for binary PSK and DPSK. 


we observe that by the inequality Q(x) < \e x ^ 2 , we have 

Pb ,coh. ^ p b ,noncoh (4.5-63) 

as expected. This is similar to the result we previously had for coherent and noncoherent 
detection of binary orthogonal FSK. Here again the difference between the performance 
of BPSK with coherent detection and binary DPSK at high SNRs is less than 0.8 dB. 
The plots given in Figure 4.5-5 compare the performance of coherently detected BPSK 
with binary DPSK. 

Performance ofDQPSK Differential QPSK is similar to binary DPSK, except that 
the phase difference between adjacent symbol intervals depends on two information 
bits ( k = 2) and is equal to 0, f , n, and for 00, 01, 11, and 10, respectively, 
when Gray coding is employed. Assuming that the transmitted binary sequence is 00, 
corresponding to a phase shift of zero in two adjacent intervals, the lowpass equivalent 
of the received signal over two-symbol intervals with noncoherent demodulation is 
given by 

ri = (n r 2 )=[\flf s \/2£ s )e J(p + (m n 2 ) (4.5-64) 

where n \ and n 2 are independent, zero-mean, circular, complex Gaussian random vari- 
ables each with variance 2Nq (variance Nq per real and complex components). The 
optimal decision region for 00 is given by Equation 4.5-59 as 

Am = {>'/ : Re[r*r 2 ] > Re[r*r 2 e ~ jS ^] , for m = 1, 2, 3 j 


(4.5-65) 
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where r\ = \/2£ s e ^ + n \ and ri = \j2S^e'^ + «i. We note that r*r 2 does not depend 
on 0. The error probability is the probability that the received vector r\ does not belong 
to Aio- As seen from Equation 4.5-65, this probability depends on the product of two 
complex Gaussian random variables r\ and r 2 . A general form of this problem, where 
general quadratic forms of complex Gaussian random variables are considered, is given 
in Appendix B. Using the result of Appendix B we can show that the bit error probability 
for DQPSK, when Gray coding is employed, is given by 

Pb = Qi(a, b) - ^I 0 (ab)e-^ (4.5-66) 


where Qi(a, b) is the Marcum Q function defined by Equations 2.3-37 and 2.3-38, 
7o(x) is the modified Bessel function of order zero, defined by Equations 2.3-32 to 
2.3-34, and the parameters a and b are defined as 


a = 



b = 



(4.5-67) 


Figure 4.5-6 illustrates the probability of a binary digit error for two- and four-phase 
DPSK and coherent PSK signaling obtained from evaluating the exact formulas derived 
in this section. Since binary DPSK is only slightly inferior to binary PSK at large SNR, 



FIGURE 4.5-6 

Probability of bit error for binary and four-phase 
PSK and DPSK. 
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and DPSK does not require an elaborate method for estimating the earner phase, it 
is often used in digital communication systems. On the other hand, four-phase DPSK 
is approximately 2.3 dB poorer in performance than four-phase PSK at large SNR. 
Consequently the choice between these two four-phase systems is not as clear-cut. One 
must weigh the 2.3-dB loss against the reduction in implementation complexity. 


■ 4.6 

A COMPARISON OF DIGITAL SIGNALING METHODS 

The digital modulation methods described in the previous sections can be compared 
in a number of ways. For example, one can compare them on the basis of the SNR 
required to achieve a specified probability of error. However, such a comparison would 
not be very meaningful, unless it were made on the basis of some constraint, such as 
a fixed data rate of transmission or, equivalently, on the basis of a fixed bandwidth. 
We have already studied two major classes of signaling methods, i.e., bandwidth and 
power-efficient signaling in Sections 4.3 and 4.4, respectively. 

The criterion for power efficiency of a signaling scheme is the SNR per bit that 
is required by that scheme to achieve a certain error probability. The error probability 
that is usually considered for comparison of various signaling schemes is P e = 10 5 . 
The Yb = required by a signaling scheme to achieve an error probability of 1 0 5 is 
a criterion for power efficiency of that scheme. Systems requiring lower Yb to achieve 
this error probability are more power-efficient. 

To measure the bandwidth efficiency, we define a parameter r, called the spectral 
bit rate, or the bandwidth efficiency, as the ratio of bit rate of the signaling scheme to 
the bandwidth of it, i.e., 

R 

r = — b/s/Hz (4.6-1) 

W 

A system with larger r is a more bandwidth-efficient system since it can transmit at a 
higher bit rate in each hertz of bandwidth. The parameters r and Yb defined above are 
the two criteria we use for comparison of power and bandwidth efficiency of different 
modulation schemes. Clearly, a good system is the one that at a given Yb provides the 
highest r, or at a given r requires the least Yb- 

The relation between Yb and the error probability for individual systems was dis- 
cussed in detail for different signaling schemes in the previous sections. From the 
expressions for error probability of various systems derived earlier in this chapter, it is 
easy to determine what Yb is required to achieve an error probability of 10~ 5 in each 
system. In this section we discuss the relation between the bandwidth efficiency and 
the main parameters of a given signaling scheme. 


4.6-1 Bandwidth and Dimensionality 

The sampling theorem states that in order to reconstruct a signal with bandwidth W, 
we need to sample this signal at a rate of at least 2 IT samples per second. In other 
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words, this signal has 2 W degrees of freedom (dimensions) per second. Therefore, 
the dimensionality of signals with bandwidth W and duration T is N = 2 WT. Al- 
though this intuitive reasoning is sufficient for our development, this statement is not 
precise. 

It is a well-known fact, that follows from the theory of entire functions, that the 
only signal that is both time- and bandwidth-limited is the trivial signal x(t) = 0. All 
other signals have either infinite bandwidth and/or infinite duration. In spite of this fact, 
all practical signals are approximately time- and bandwidth-limited. Recall that a real 
signal x(t) has an energy £ x given by 

/ oo poo 

x\t)dt= / \X(f)\ 2 df (4.6-2) 

-oo J — OO 


Here we focus on time-limited signals that are nearly bandwidth-limited. We assume 
that the support of x(t), i.e., where x(t) is nonzero, is the interval [-T /2, T /2] ; and 
we also assume that x(t) is ^-bandwidth-limited to W, i.e., we assume that at most 
a fraction ?? of the energy in x(t) is outside the frequency band [— IT, W |. In other 
words. 


1 

T x 



\X(f)\ 2 df>l-, 1 


(4.6-3) 


The dimensionality theorem stated below gives a precise account for the number 
of dimensions of the space of such signals x{t). 


The Dimensionality Theorem Consider the set of all signals x(t) with support 
[—77 2, T /2] that are );-bandwidth-limited to W . Then there exists a set of N orthonor- 
mal signals^ {<pj(t), 1 < j < N] with support [-T/2, T/ 2] such that x(t) can be 
e- approximated by this set of orthonormal signals, i.e., 


1 

T x 



x(t) - 2_J,x(t), <j>j(t))(pj(t) 


dt < e 


(4.6-4) 


where e = \2ri and N = \_2WT + 1J. 

From the dimensionality theorem we can see that the relation 


N 2 WT 


(4.6-5) 


is a good approximation to the dimensionality of the space of functions that are roughly 
time-limited to T and band-limited to W. 

The dimensionality theorem helps us to derive a relation between bandwidth and 
dimensionality of a signaling scheme. If the set of signals in a signaling scheme consists 
of M signals each with duration T s , the signaling interval, and the approximate band- 
width of the set of signals is W, the dimensionality of the signal space is N = 2WT S . 


tSignals <pj(t) can be expressed in terms of the prolate spheroidal wave functions. 
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Using the relation R s = l/T s , we have 


W = 


R S N 


2 


Since R = R s log 2 M, we conclude that 


W = 


RN 

2 log 2 M 


and 


R 2 log 2 M 
V = W = N 


(4.6-6) 


(4-6-7) 


(4.6-8) 


This relation gives the bandwidth efficiency of a signaling scheme in terms of the 
constellation size and the dimensionality of the constellation. 

In one-dimensional modulation schemes (ASK and PAM), N = 1 and r = 
2 log 2 M. PAM and ASK can be transmitted as single-sideband (SSB) signals. 

For two-dimensional signaling schemes such as QAM and MPSK, we have N = 2 
and r = log 2 M. It is clear from the above discussion that in MASK, MPSK, and 
MQAM signaling schemes the bandwidth efficiency increases as M increases. As we 
have seen before in all these systems, the power efficiency decreases as M is increased. 
Therefore, the size of constellation in these systems determines the tradeoff between 
power and bandwidth efficiency. These systems are appropriate where we have limited 
bandwidth and desire a bit rate-to-bandwidth ratio r > 1 and where there is sufficiently 
high SNR to support increases in M. Telephone channels and digital microwave radio 
channels are examples of such band-limited channels. 

For M - ary orthogonal signaling, N = M and hence Equation 4.6-8 results in 


2 log 2 M 
M 


(4.6-9) 


Obviously in this case as M increases, the bandwidth efficiency decreases, and for 
large M the system becomes very bandwidth-inefficient. Again as we had seen before 
in orthogonal signaling, increasing M improves the power efficiency of the system, 
and in fact this system is capable of achieving the Shannon limit as M increases. Here 
again the tradeoff between bandwidth and power efficiency is clear. Consequently, 
M - ary orthogonal signals are appropriate for power-limited channels that have suffi- 
ciently large bandwidth to accommodate a large number of signals. One example of 
such channels is the deep space communication channel. 

We encounter the tradeoff between bandwidth and power efficiency in many com- 
munication scenarios. Coding techniques treated in Chapters 7 and 8 study various 
practical methods to achieve this tradeoff. 

We will show in Chapter 6 that there exists a fundamental tradeoff between band- 
width and power efficiency. This tradeoff between r and £/, /No holds as P e tends to 
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zero and is given by (see Equation 6.5^49) 


£ b 2 r - 1 
N 0 > r 


(4.6-10) 


Equation 4.6-10 gives the condition under which reliable communication is possi- 
ble. This relation should hold for any any communication system. As r tends to 0 (band- 
width becomes infinite), we can obtain the fundamental limit on the required £b/No in 
a communication system. This limit is the — 1.6 dB Shannon limit discussed before. 

Figure 4.6-1 illustrates the graph of r = R/W versus SNR per bit for PAM, QAM, 
PSK, and orthogonal signals, for the case in which the error probability is Pm = 10 5 . 
Shannon’s fundamental limit given by Equation 4.6-10 is also plotted in this figure. 
Communication is, at least theoretically, possible at any point below this curve and is 
impossible at points above it. 



FIGURE 4.6-1 

Comparison of several modulation schemes at P e = 10 -5 symbol error probability. 
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■ 4.7 

LATTICES AND CONSTELLATIONS BASED ON LATTICES 


In band-limited channels, when the available SNR is large, large QAM constellations 
are desirable to achieve high bandwidth efficiency. We have seen examples of QAM 
constellations in Figures 3.2-4 and 3.2-5. Figure 3.2-5 is particularly interesting since 
it has a useful grid-shaped repetitive pattern in two-dimensional space. Using such 
repetitive patterns for designing constellations is a common practice. In this approach 
to constellation design, a repetitive infinite grid of points and a boundary for the con- 
stellation are selected. The constellation is then defined as the set of points of the 
repetitive grid that are within the selected boundary. Lattices are mathematical struc- 
tures that define the main properties of the repetitive grid of points used in constellation 
design. In this section we study properties of lattices, boundaries, and the lattice-based 
constellations. 


4.7-1 An Introduction to Lattices 

An n -dimensional lattice is defined as a discrete subset of M" that has a group structure 
under ordinary vector addition. By having a group structure we mean that any two 
lattice points can be added and the result is another lattice point, there exists a point in 
the lattice denoted by 0 that when added to any lattice point x the result is x itself, and 
for any jr there exists another point in the lattice, denoted by — jc, that when added to 
x results in 0 . 

With the lattice definition given above, it is clear that Z, the set of integers, is a one- 
dimensional lattice. Moreover, for any a > 0, the set A = aZ is a one-dimensional lat- 
tice. In the plane, Z 2 , the set of all points with integer coordinates, is a two-dimensional 
lattice. Another example of a two-dimensional lattice, called the hexagonal lattice, is the 
set of points shown in Figure 4.7-1. These points can be written as a( \ ,0) + h , 

where a and b are integers. The hexagonal lattice is usually denoted by At. 

In general, an n -dimensional lattice A can be defined in terms of /? basis vectors 
gi eK", 1 < i < n, such that any lattice point jc can be written as a linear combination 


FIGURE 4.7-1 

The two-dimensional hexagonal 
lattice. 
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of gj’ s using integer coefficients. In other words, for any x e A, 

n 

X = Y.a.g, (4.7-1) 

1=1 

where a, e Z for 1 < i < n. We can also define A in terms of an n x n generator 
matrix, denoted by G, whose rows are {g, , 1 < i < n } . Since the basis vectors can be 
selected differently, the generator matrix of a lattice is not unique. With this definition, 
for any x e A, 

x=aG (4.7-2) 


where a e Z" is an n -dimensional vector with integer components. Equation 4.7-2 
states that any /? -dimensional lattice A can be viewed as a linear transformation of Z" 
where the transformation is represented by matrix G. In particular, all one-dimensional 
lattices can be represented as cyZ for some a > 0. 

The generator matrix of Z 2 is 1 2 , the 2x2 identity matrix. In general the generator 
matrix of Z" is I n . The generator matrix of the hexagonal lattice is given by 


G = 


"1 

1 

.2 


0 ' 

vT 

2 . 


(4-7-3) 


Two lattices are called equivalent if one can be obtained from the other by a 
rotation, reflection, scaling, or combination of these operations. Rotation and reflection 
operations are represented by orthogonal matrices. Orthogonal matrices are matrices 
whose columns constitute a set of orthonormal vectors. If A is an orthogonal matrix, 
then A A' = A‘ A = I. In general, any operation of the form aG on the lattice, where 
a > 0 and G is orthogonal, results in an equivalent lattice. For instance, the lattice with 
the generator matrix 


G = 




(4.7-4) 


is obtained from Z 2 by a rotation of 45°; therefore it is equivalent to Z 2 . Note that 
GG' = I . If after rotation the resulting lattice is scaled by >/2, the overall generator 
matrix will be 


G = 


1 

-1 


1 

1 


(4-7-5) 


This lattice is the set of points in Z 2 for which the sum of the two coordinates is even. 
This lattice is also equivalent to Z 2 . Matrix G in Equation 4.7-5, which represents a 
rotation of 45° and a scaling of s/l, is usually denoted by R. Therefore, RTr denotes 
the lattice of all integer coordinate points in the plane with an even sum of coordinates. 
It can be easily verified that Rrl? = 2Z 2 . 

Translating (shifting) a lattice by a vector c is denoted by A + c, and the result, in 
general, is not a lattice because under a general translation there is no guarantee that 0 
will be a member of the translated lattice. However, if the translation vector is a lattice 
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FIGURE 4.7-2 

QAM constellation. 


point, i.e., if c e A, then the result of translation is the original lattice. From this we 
conclude that any point in the lattice is similar to any other point, in the sense that all 
points of the lattice have the same number of lattice points at a given distance. Although 
translation of a lattice is not a lattice in general, the result is congruent to the original 
lattice with the same geometric properties. Translation of lattices is frequently used to 
generate energy-efficient constellations. Note that the QAM constellations shown in 
Figure 4.7-2 consist of points in a translated version of Z 2 where the shift vector is 
(i, 1) ; i.e., the constellation points are subsets of Z 2 + (|, i). 

In addition to rotation, reflection, scaling, and translation of lattices, we introduce 
the notion of the Af-fold Cartesian product of lattice A. The M-fold Cartesian product 
of A is another lattice, denoted by A M , whose elements are (Mn)-dimensional vectors 
(A. i , Xn, . . . , Xm) where each A ; is in A. We observe that Z" is the //-fold Cartesian 
product of Z. 

The minimum distance <7 mm ( A) of a lattice A is the minimum Euclidean distance 
between any two lattice points; and the kissing number, or the multiplicity, denoted by 
^Vmin(A), is the number of points in the lattice that are at minimum distance from a 
given lattice point. If //-dimensional spheres with radius ^ lnin 2 (A) are centered at lattice 
points, the kissing number is the number of spheres that touch one of these spheres. For 
the hexagonal lattice r7 m in(^2) = 1 and V m ; n (A 2 ) = 6. For Z ", we have d m j„(Z") = 1 
and N m j n (Z") = 2 n. In this lattice the nearest neighbors of 0 are points with n — 1 zero 
coordinates and one coordinate equal to ±1. 

The Voronoi region of a lattice point jc is the set of all points in R" that are closer to 
x than any other lattice point. The boundary of the Voronoi region of a lattice point x 
consists of the perpendicular bisector hyperplanes of the line segments connecting x to 
its nearest neighbors in the lattice. Therefore, a Voronoi region is a polyhedron bounded 
by A^ m j n (A) hyperplanes. The Voronoi region of the point 0 in the hexagonal lattice is 
the hexagon shown in Figure 4.7-3. Since all points of the lattice have similar distances 
from other lattice points, the Voronoi regions of all lattice points are congruent. In 
addition, the Voronoi regions are disjoint and cover R" ; hence the Voronoi regions of 
a lattice induce a partition of R" . 
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FIGURE 4.7-3 

The Voronoi region in the hexagonal lattice. 


The fundamental volume of a lattice is defined as the volume of the Voronoi region 
of the lattice and is denoted by V (A). Since there exists one lattice point per fundamental 
volume, we can define the fundamental volume as the reciprocal of the number of lattice 
points per unit volume. It can be shown (see the book by Conway and Sloane (1999)) 
that for any lattice 


V(A) = |det(G)| (4.7-6) 

We notice that V (Z") = 1 and ViAi) = "y ! . 

Rotation, reflection, and translation do not change the fundamental volume, the 
minimum distance, or the kissing number of a lattice. Scaling a lattice A with generator 
matrix G by a > 0 results in a lattice a A with generator matrix o/G. hence 

V (a A) = |det(aG)| = a" V (A) (4.7-7) 

The minimum distance of the scaled lattice is obviously scaled by a. The kissing 
number of the scaled matrix is equal to the kissing number of the original lattice. 

The Hermite parameter of a lattice is denoted by y c (A) and is defined as 

Yc(A) = (4.7-8) 

[V(A)p 

This parameter has an important role in defining the coding gain of the lattice. It is 
clear that y c ( Z") = 1 and y c (A 2 ) = ^ ^ 1.1547. 

Since 1/ V (A) indicates the number of lattice points per unit volume, we conclude 
that among lattices with a given minimum distance, those with a higher Hermite pa- 
rameter are denser in the sense that they have more points per unit volume. In other 
words, for a given c/ min , a lattice with high y c packs more points in unit volume. This is 
exactly what we need in constellation design since d mm determines the error probability 
and having more points per unit volume improves bandwidth efficiency. It is clear from 
above that A 2 can provide 15% higher coding gain than the integer lattice Z 1 2 . 

Some properties of y r (A) are listed below. The interested reader is referred to the 
paper by Forney (1988) for details. 

1. y e (A) is a dimensionless parameter. 

2. y c ( A) is invariant to scaling and orthogonal transformations (rotation and reflection). 
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3. For all M, y c ( A) is invariant to the M -fold Cartesian product extension of the lattice; 
i.e., y c ( A m ) = y c ( A). 


Multidimensional Lattices 

Most lattice examples presented so far are one- or two-dimensional. We have also 
introduced the n -dimensional lattice Z" which is an /z -fold Cartesian product of Z. In 
designing efficient multidimensional constellations, sometimes it is necessary to use 
lattices different from Z". We introduce some common multidimensional lattices in 
this section. 

We have already introduced the two-dimensional rotation and scaling matrix R as 


This notion can be generalized to four dimensions as 

'll 0 O' 

-11 0 0 

R = 

0 0 11 

0 0-11 


(4.7-9) 


(4.7-10) 


It is seen that R 2 = 21 4 . Extension of this notion from 4 to 2 n dimensions is straightfor- 
ward. As a result, for any 2n -dimensional lattice A we have R 2 A = 2A. In particular 
R 2 Z 4 = 2Z 4 . Note that R Z 4 is a lattice whose members are 4- tuples of integers in 
which the sum of the first two coordinates and the sum of the last two coordinates are 
even. Therefore RZ 4 is a sublattice of Z 4 . In general, a sublattice of A, denoted by 
A', is a subset of points in A that themselves constitute a lattice. In algebraic terms, a 
sublattice is a subgroup of the original lattice. 

We already know that V(Z 2 ) = 1. From Equation 4.7-6, we have V < RZ 4 ) = 

| det(f?)| = 4. From this it is clear that one-quarter of the points in Z 4 belong to RZ 4 . 
This can also be seen from the fact that only one-quarter of points in Z" have the sum of 
the first and the last two components both even. Therefore, we conclude that Z 4 can be 
partitioned into four subsets that are all congruent to RZ 4 . We will discuss the notion 
of lattice partitioning and coset decomposition of lattices in Chapter 8 in the discussion 
of coset codes. 

Another example of a multidimensional lattice is the four-dimensional Schlcifli 
lattice denoted by D+. One generator matrix for this lattice is 


'2 0 0 O' 
10 0 1 
0 10 1 
0 0 11 


(4.7-11) 


This lattice represents all 4-tuples with integer coordinates in which the sum of the four 
coordinates is even, similar to RZ 2 in a plane. For this lattice V (£>4) = | det(G)| = 2, 
and the minimum distance is the distance between points (0, 0, 0, 0) and (1, 1,0, 0), 
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thus <i m in(D 4 ) = y/2. It can be easily seen that the kissing number for this lattice is 
AW£> 4 ) = 24 and 


y c (D 4 ) = dminiE>4) 2 = 4 - = sfl « 1.414 (4.7-12) 

[V (D 4 )]" 24 

This shows that D 4 is approximately 41% denser than Z 4 . 


Sphere Packing and Lattice Density 

For any n -dimensional lattice A, the set of n -dimensional spheres of radius 
centered at all lattice points constitutes a set of nonoverlapping spheres that cover a 
fraction of the -dimensional space. A measure of denseness of a lattice is the fraction 
of the //-dimensional space covered by these spheres. The problem of packing the space 
with //-dimensional spheres such that the highest fraction of the space is covered, or 
equivalently, packing as many possible spheres in a given volume of space, is called 
the sphere packing problem. 

In the one-dimensional space, all lattices are equivalent to Z and the sphere packing 
problem becomes trivial. In this space, spheres are simply intervals of length 1 centered 
at lattice points. These spheres cover the entire length, and therefore the fraction of the 
space covered by these spheres is 1 . 

In Problem 4.56, it is shown that the volume of an n -dimensional sphere with radius 
R is given by V„(R ) = B n R", where 


B 


n 


r (! + 1) 


(4.7-13) 


The gamma function is defined in Equation 2.3-22. In particular, note that from Equa- 
tion 2.3-23 we have 


T 



(^) ! n even and positive 

spH »(»-2)(»-4)-3xl n oc j ( j an( j p 0s j t j ve 

2 2 


(4.7-14) 


Substituting Equation 4.7-14 into 4.7-13 yields 


Therefore, 


It 2 



n even 

b) ! 


2 n 7r'P(p-)\ 

n\ 

n odd 


V,,(R) = 


7t 2 



R " 


n — 1 . . . 

2 n n^(p)\ 


R" 


n even 


(4.7-15) 


(4.7-16) 


n\ 


n odd 
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FIGURE 4.7-4 

The volume of an ra-dimensional sphere with radius 1. 


Clearly, B n denotes the volume of an n -dimensional sphere with radius 1. A plot of B„ 
for different values of n is shown in Figure 4.7-4. It is interesting to note that for large 
n the value of B„ goes to zero, and it has a maximum at n = 5. 

The volume of the space that corresponds to each lattice point is V (A), the fun- 
damental volume of the lattice. We define the density of a lattice A, denoted by A(A), 
as the ratio of the volume of a sphere with radius l/mi ^ (A) to the fundamental volume of 
the lattice. This ratio is the fraction of the space covered by the spheres of radius A) 

and centered at lattice points. From this definition we have 


V, 


A(A) = 


7 ( ^min(A) \ 

n V 2 ) 


V(A) 


B n 

V(A) 


^min(A) 

2 


_ B n ( djj A) s 

- 2 " V vhA ) t 

= A) 


(4.7-17) 


where we have used the definition of y c ( A) given in Equation 4.7-8. 

example 4.7-1. To obtain the density of Z 2 , we note that for this lattice n = 2, 
d m i n = 1, and V(Z 2 ) = 1. Substituting in Equation 4.7-17, we obtain 


A(Z ,! ) = 


B n 

F(A) 


dmin( A) 


= 7 r 


= - = 0.7854 
4 


(4.7-18) 
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J'l 

For A 2 we have n = 2, d mm — 1, and V ( A 2 ) = A 2 . Therefore, 


A(A 2 ) = 


B, , 


V(A) 


^min(A) 


7t 

7T 

2 


7t 

2V3 


= 0.9069 


This shows that A 2 is denser than Z 2 . 


(4.7-19) 


It can be shown that among all two-dimensional lattices, A 2 has the highest density. 
Therefore the hexagonal lattice provides the best sphere packing in the plane. 

example 4.7-2. For D 4 , the Schlafli lattice, we have n — 4, d, mn ( D 4 ) = \fl, and and 
V(£> 4 ) = 2. Therefore, 

B, , f d m i n (A)V ! Jt 2 

A(Ai) = — — = — = 0.6169 (4.7-20) 

V(A) \ 2 ) 16 


4.7-2 Signal Constellations from Lattices 

A signal constellation C can be carved from a lattice by choosing the points of a 
lattice, or a shifted version of it, that are within some region 7 Z. The signal points 
are therefore the intersection of the lattice points, or its shift, and region 1Z, i.e., 
C(A,1Z) = (A + a) Cl TZ, where a denotes a possible shift in lattice points. For 
instance, in Figure 4.7-2, the points of the constellation belong to Z 2 + (7, |), and 
the region 1Z is either a square or a cross-shaped region depending on the constella- 
tion size. For M = 4, 16, 64, 7?, is a square; and for M = 8, 32 it has a cross shape. 
The constellation size M is the number of lattice (or shifted lattice) points within the 
boundary. Since F(A) is the reciprocal of the number of lattice points per unit volume, 
we conclude that if the volume of the region 1Z, denoted by VOZ), is much larger than 
V(A), then 

M (4.7-21) 

V (A) 

The average energy of a constellation with equiprobable messages is 

1 M 

<? avg= ^ Ell*'" II 2 {A1-2T) 

m= 1 

For a large constellation we can use the continuous approximation by assuming that the 
probability is uniformly distributed on the region 1Z, and by finding the second moment 
of the region as 

£(K) = ' / ||jc || 2 z/jc (4.7-23) 

VOZ) Jn 

For large values of M, £(TZ) is quite close to £ avg . Table 4.7-1 gives values of £('TZ) and 
£ avg for M = 16, 64, 256 for a square constellation. The last column of this table gives 
the relative error in substituting the average energy with the continuous approximation. 
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■ TABLE 4.7-1 

Average Energy and Its Continuous Approximation 
for Square Constellations 


M 

^avg 

sm 

£(7?.)-£ av g 

£CR) 


5 

8 


16 



0.06 


2 

3 



21 

32 


64 



0.015 


2 

3 



85 

128 


256 



0.004 


2 

3 



To be able to compare an n -dimensional constellation C with QAM, we define the 
average energy per two dimensions as 

Wd(C) = 2 £avg = 4 , V \\x m II 2 ( 4 . 7 - 24 ) 

Using the continuous approximation, the average energy per two dimensions can be 
well approximated by 

£avg/2D « in ^ " 2 ^ (4 ' 7_25) 


Error Probability and Constellation Figure of Merit 

In a lattice-based constellation, each signal point has N m m nearest neighbors; therefore 
at high SNRs we have 


N min Q 



( 4 . 7 - 26 ) 


An efficient constellation provides large d tmn at a given average energy. To study and 
compare the efficiency of different constellations, we express the error probability as 


N mm Q 



£avg/2D 

No 


( 4 . 7 - 27 ) 


£• 

The term ff 2D represents the average SNR per two dimensions and is denoted by 
SNR avg /2D- The numerator of SNR avg /2D is the average signal energy per two dimensions, 
and its denominator is the noise power per two dimensions. If we define the constellation 
figure of merit (CFM) as 


CFM(C) = 


< m (Q 

£avg/2D(0 


( 4 . 7 - 28 ) 
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where £ a vg/2D(0 is given by Equation 4.7-24, we can express the error probability from 
Equation 4.7-27 as 


N min Q 


CFM(C) £ 


avg/2D 

"a^T 


= N mm Q 


I CFM(C) 


• SNR; 


■avg/2D 


(4.7-29) 


Clearly the constellation bgure of merit determines the coefficient by which the £ a vg/2D(0 
is scaled in the expression of error probability. 

For a square QAM constellation from Equation 3.2^11 we have 


2 _ 6£avg 

min ~ M — 1 


(4.7-30) 


Therefore, 


CFM = 



(4.7-31) 


Note that from Equation 4.3-30 we have 

Pe*4Q 


3 £ 

J C'- 


avg 


M - 1 N 0 


= 4 Q 


CFM £ ; 


avg 


2 No 


(4.7-32) 


which is in agreement with Equation 4.7-29. Also note that in a square QAM constel- 
lation, for large M we can write 

6 6 

CFM % — = — (4.7-33) 

M 2 k 

where k denotes the number of bits per two dimensions. 


Coding and Shaping Gains 

In Problem 4.57 we consider a constellation C based on the intersection of the shifted 
lattice Z" + (j, j, . . . , 2 ) and the boundary region 1Z defined as an n -dimensional 
hypercube centered at the origin with side length L. In this problem it is shown that 
when n is even, and L = 2 f is a power of 2, the number of bits per two dimensions, 
denoted by ft, is equal to 21 + 2, and CFM(C) is approximated by 

CFM(C) « ^ (4.7-34) 

which is equal to what we obtained for a square QAM. Since the Z" with the cubic 
boundary is the simplest possible n -dimensional constellation, its CFM is taken as the 
baseline CFM to which the CFMs of other constellations are compared. This base- 
line constellation figure of merit is denoted by CFMo- Note that in an n-dimensional 
constellation of size M, the number of bits per two dimensions is 

0 = -log 2 M 
n 


(4.7-35) 
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Hence, 

2^ = M" (4.7-36) 

From this and Equation 4.7-21, we have 

(4.7-37) 


2 P 


vm 

I V(A). 


Using this result in Equation 4.7-34 gives the value of the baseline constellation figure 
of merit as 


cm, = ^ 


V(A) 

[VC1Z)\ 


(4.7-38) 


From Equations 4.7-28 and 4.7-38 we have 


CFM(C) 4 in [V(72)]i 

^ 2 X ~ 

CFMo [V(A)]« 6tavg/2D 


Now we define the shaping gain of region 72 as 


YsCTZ) 


[V(72)p 

6<?avg/2D 


n[V{K)f + ~n 

111 ||jc|| 2 £/jc 

J n 


(4.7-39) 


(4.7-40) 


where in the last step we used Equation 4.7-25. It can be shown that the shaping gain 
is independent of scaling and orthogonal transformations of the region 72. It can also 
be shown that y s (72 M ) = y s (72), where 72 M denotes the M-fold Cartesian product of 
the boundary region 72. From these, and the properties of y c (A), it is clear that scaling, 
orthogonal transformation, and Cartesian product of A and 72 have no effect on the 
figure of merit of the constellation based on A and 72. 

From Equation 4.7-39 we have 


CFM(C) « CFM„ • Yc( A) • y,(72) (4.7-41) 

This relation shows that the relative gain of a given constellation over the baseline 
constellation can be viewed as the product of two independent terms, namely, the fun- 
damental coding gain of the lattice, denoted by y c ( A) and given by Equation 4.7-8, and 
the shaping gain of region 72, denoted by yf'JZ) and given in Equation 4.7—40. The fun- 
damental coding gain depends on the choice of the lattice. Choosing a dense lattice with 
high coding gain that provides large minimum distance per unit volume, or, equivalently, 
requires low volume for a given minimum distance, is highly desirable and improves the 
performance. Similarly, the shaping gain depends only on the choice of the boundary of 
the constellation, and choosing a region 72 with high shaping gain improves the power 
efficiency of the constellation and results in improved performance of the system. 

In Problem 4.57 it is shown that if 72 is an ^-dimensional hypercube centered at 
the origin, then y s ( 72) = 1. 
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example 4.7-3. For a circle of radius r, we have VCR.) = nr 2 and 


Therefore, 


n2n 


(x 2 + y 2 )dxdy= I I z 1 zdzdQ 


lo Jo 


x 2 +y 2 <r 2 


= V 
2 


n(K) = 


n[V(K )] l+ i 

12 f ||jr|| 2 t/jc 

Jn 

2 (nr 2 ) 2 
6nr 4 


= - « 1.0472 ~ 0.2 dB 
3 


(4.7-42) 


(4.7—43) 


Recall that ( yl 2 ) ~ 1.1547 ~ 0.62 dB; therefore a hexagonal constellation with a 

circular boundary is capable of providing an asymptotic overall gain of 0.82 dB over 
the baseline constellation. 


example 4.7-4. As a generalization of Example 4.7-3, let us consider the case where 
1Z is an n -dimensional sphere of radius R and centered at the origin. In this case 


II jc II dx = 


m 


r 2 dV n {r ) 


= / r 2 d(B„r") 

Jo 

r R 

= B n / nr n+l dr 

Jo 

nB„ 


n + 2 
n 

n + 2 


-R 


n+2 


R Z V n (R) 


Substituting this result into Equation 4.7-40 yields 


(4.7-44) 


= 


n + 2 VC(R) 


12 


R 


(4.7^15) 


Note that V," ( R) is the length of the side of an ^-dimensional cube that has a volume 
equal to an n-dimensional sphere of radius R. Substituting for V n (R) from Equa- 
tion 4.7-16 results in 


Ys(K) = 


(n + 2)n 

12 [r (| + i )] 1 


(4.7-46) 


A plot of y s CR.) for an n-dimensional sphere as a function of n is shown in Figure 4.7-5. 
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FIGURE 4.7-5 

The shaping gain for an n-dimensional sphere. 


It can be shown that among all possible boundaries in an n-dimensional space, 
spherical boundaries are the most efficient. As the dimensionality of the space in- 
creases, spherical boundaries can provide an asymptotic shaping gain of ^ which is 
approximately 1.423 equivalent to 1.533 dB. Therefore, 1.533 dB is the maximum gain 
that shaping can provide. Getting close to this bound requires high dimensional con- 
stellations. For instance, increasing the dimensionality of the space to 100 will provide 
a shaping gain of roughly 1.37 dB, and increasing it to 1000 provides a shaping gain 
of 1.5066 dB. 

Unlike shaping gain, the coding gain can be increased indefinitely by using high 
dimensional dense lattices. However, such lattices have very large kissing numbers. 
The effect of large kissing numbers dramatically offsets the effect of the increased 
coding gain, and the overall performance of the system will remain within the bounds 
predicted by Shannon and discussed in Chapter 6. 


■ 4.8 

DETECTION OF SIGNALING SCHEMES WITH MEMORY 

When the signal has no memory, the symbol-by-symbol detector described in the pre- 
ceding sections of this chapter is optimum in the sense of minimizing the probability 
of a symbol error. On the other hand, when the transmitted signal has memory, i.e., the 
signals transmitted in successive symbol intervals are interdependent, then the optimum 
detector is a detector that bases its decisions on observation of a sequence of received sig- 
nals over successive signal intervals. In this section, we describe a maximum-likelihood 
sequence detection algorithm that searches for the minimum Euclidean distance path 
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through the trellis that characterizes the memory in the transmitted signal. Another pos- 
sible approach is a maximum a posteriori probability algorithm that makes decisions 
on a symbol-by-symbol basis, but each symbol decision is based on an observation 
of a sequence of received signal vectors. This approach is similar to the maximum a 
posteriori detection rule used for decoding turbo codes, known as the BCJR algorithm, 
that will be discussed in Chapter 8. 


4.8-1 The Maximum Likelihood Sequence Detector 


Modulation systems with memory can be modeled as finite-state machines which can 
be represented by a trellis, and the transmitted signal sequence corresponds to a path 
through the trellis. Let us assume that the transmitted signal has a duration of K symbol 
intervals. If we consider transmission over K symbol intervals, and each path of length 
K through the trellis as a message signal, then the problem reduces to the optimal 
detection problem discussed earlier in this chapter. The number of messages in this case 
is equal to the number of paths through the trellis, and a maximum likelihood sequence 
detection (MLSD) algorithm selects the most likely path (sequence) corresponding to 
the received signal r(t) over the K signaling interval. As we have seen before, ML 
detection corresponds to selecting a path of K signals through the trellis such that the 
Euclidean distance between that path and r(t) is minimized. Note that since 


r KT< K f kT s 

\r(t) - s(t)\ 2 dt = ^2 I \r(t) - s(t)\ 2 dt 


(4-8-1) 


k=l J(k-m 


the optimal detection rule becomes 


(s (1) , s (2) , . . . , s (K) ) = argmin V ||r® - s® || 2 

(s( 1 ),s( 2 ),...,s(*))gT 

K 

= argmin ^ D (r®, s (k) ) 

(s (1 \s (2 \...,s (K) )eY JT'i 


(4.8-2) 


where T denotes the trellis. The above argument applies to all modulation systems with 
memory. 

As an example of the maximum-likelihood sequence detection algorithm, let us 
consider the NRZI signal described in Section 3.3. Its memory is characterized by the 
trellis shown in Figure 3.3-3. The signal transmitted in each signal interval is binary 
PAM. Hence, there are two possible transmitted signals corresponding to the signal 
points 5! = —S 2 = \f£~h, where £ b is the energy per bit. 

In searching through the trellis for the most likely sequence, it may appear that 
we must compute the Euclidean distance for every possible sequence. For the NRZI 
example, which employs binary modulation, the total number of sequences is 2 K . 
However, this is not the case. We may reduce the number of sequences in the trellis 
search by using the Viterbi algorithm to eliminate sequences as new data are received 
from the demodulator. 
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The Viterbi algorithm is a sequential trellis search algorithm for performing ML 
sequence detection. It is described in Chapter 8 as a decoding algorithm for convo- 
lutional codes. We describe it below in the context of the NRZI signal detection. We 
assume that the search process begins initially at state So- The corresponding trellis is 
shown in Figure 4.8-1. 

At time t = T , we receive r\ = s\ m> + n from the demodulator, and at t = 2 T, we 
receive r 2 = .v-/ " ’ + n 2 . Since the signal memory is 1 bit, which we denote by L = 1, 
we observe that the trellis reaches its regular (steady-state) form after two transitions. 
Thus, upon receipt of r 2 at t =2 T (and thereafter), we observe that there are two signal 
paths entering each of the nodes and two signal paths leaving each node. The two paths 
entering node So at t = 2 T correspond to the information bits (0, 0) and (1, 1) or, 
equivalently, to the signal points (—>/&> — \J~£h) and (>/£b, —\f£b), respectively. The 
two paths entering node Si at t = 2 T correspond to the information bits (0, 1) and 
(1,0) or, equivalently, to the signal points \f£b) and respectively. 

For the two paths entering node So, we compute the two Euclidean distance metrics 

Do( 0, 0) = (ri + yf&b) 2 + (j2 + \/Jfe) 2 

(4.8-3) 

D 0 (l, 1) = {n ~ V£~b) 2 + (r 2 + s/T b ) 2 


by using the outputs and r 2 from the demodulator. The Viterbi algorithm compares 
these two metrics and discards the path having the larger (greater-distance) metric j 
The other path with the lower metric is saved and is called the survivor at / = 2 T. The 
elimination of one of the two paths may be done without compromising the optimality 
of the trellis search, because any extension of the path with the larger distance beyond 
t = 2T will always have a larger metric than the survivor that is extended along the 
same path beyond t = 2T. 

Similarly, for the two paths entering node Si at t = 2 T, we compute the two 
Euclidean distance metrics 


A(0, 1) = 0-1 + sfetf + (r 2 - JT b ) 2 

Di(l, 0) = (ri - 7^) 2 + (r 2 - V£~b) 2 


(4.8-4) 


tNote that, for NRZI, the reception of r 2 from the demodulator neither increases nor decreases the relative 
difference between the two metrics Dq(0, 0) and Dq(L 1)- At this point, one may ponder the implications 
of this observation. In any case, we continue with the description of the ML sequence detection based on 
the Viterbi algorithm. 
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by using the outputs r\ and ri from the demodulator. The two metrics are compared, and 
the signal path with the larger metric is eliminated. Thus, at t = 2T, we are left with two 
survivor paths, one at node So and the other at node Si , and their corresponding metrics. 
The signal paths at nodes So and Si are then extended along the two survivor paths. 

Upon receipt of r 3 at t = 3T, we compute the metrics of the two paths entering 
state So- Suppose the survivors at t = 2T are the paths (0, 0) at So and (0, 1) at Si. 
Then the two metrics for the paths entering So at t = 3T are 


A>(0, 0, 0) = A>(0, 0) + (r 3 + ySfc) 2 
A)(0, 1, 1) = T>i(0, 1) + (r 3 + \f£b) 2 


(4.8-5) 


These two metrics are compared, and the path with the larger (greater-distance) metric 
is eliminated. Similarly, the metrics for the two paths entering Si at t = 3T are 


ZM0, 0, 1) = A)(0, 0) + (r 3 - sfTbf 

D!(0,1,0) = £>i(0, l) + (r 3 -V ^) 2 


(4.8-6) 


These two metrics are compared, and the path with the larger (greater-distance) metric 
is eliminated. 

This process is continued as each new signal sample is received from the demodu- 
lator. Thus, the Viterbi algorithm computes two metrics for the two signal paths entering 
a node at each stage of the trellis search and eliminates one of the two paths at each 
node. The two survivor paths are then extended forward to the next state. Therefore, 
the number of paths searched in the trellis is reduced by a factor of 2 at each stage. 

It is relatively easy to generalize the trellis search performed by the Viterbi algo- 
rithm for M-axy modulation. For example, consider a system that employs M = 4 sig- 
nals and is characterized by the four-state trellis shown in Figure 4.8-2. We observe that 
each state has two signal paths entering and two signal paths leaving each node. The 
memory of the signal is L = 1. Hence, the Viterbi algorithm will have four survivors 
at each stage and their corresponding metrics. Two metrics corresponding to the two 
entering paths are computed at each node, and one of the two signal paths entering the 
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node is eliminated at each state of the trellis. Thus, the Viterbi algorithm minimizes the 
number of trellis paths searched in performing ML sequence detection. 

From the description of the Viterbi algorithm given above, it is unclear how 
decisions are made on the individual detected information symbols given the surviving 
sequences. If we have advanced to some stage, say K, where K L in the trellis, 
and we compare the surviving sequences, we shall find that with high probability all 
surviving sequences will be identical in bit (or symbol) positions K — 5L and less. In 
a practical implementation of the Viterbi algorithm, decisions on each information bit 
(or symbol) are forced after a delay of 5 L bits (or symbols), and hence the surviving 
sequences are truncated to the 5 L most recent bits (or symbols). Thus, a variable delay 
in bit or symbol detection is avoided. The loss in performance resulting from the sub- 
optimum detection procedure is negligible if the delay is at least 5 L. This approach to 
implementation of Viterbi algorithm is called path memory truncation. 

example 4 . 8 - 1 . Consider the decision rule for detecting the data sequence in an NRZI 
signal with a Viterbi algorithm having a delay of 5 L bits. The trellis for the NRZI 
signal is shown in Figure 4.8-1. In this case, L = 1; hence the delay in bit detec- 
tion is set to 5 bits. Hence, at t = 67’, we shall have two surviving sequences, one 
for each of the two states and the corresponding metrics /X6(£>i, £>2, b $ • £>4, £>5, b() and 
, b' 2 , £>3, b\, £> 5 , b' 6 ). At this stage, with probability nearly equal to 1, bit b\ will 
be the same as b \ ; that is, both surviving sequences will have a common first branch. 
If b\ ^ b \ , we may select the bit ib\ or b \ ) corresponding to the smaller of the two 
metrics. Then the first bit is dropped from the two surviving sequences. At t — IT , 
the two metrics p>i(b 2 , £>3, £>4 , £>5, £>6, £>7) and (b 2 , £>3, b\, b' 5 , b f 6 , b 2 ) will be used to 
determine the decision on bit b 2 . This process continues at each stage of the search 
through the trellis for the minimum-distance sequence. Thus the detection delay is fixed 
at 5 bitsd 


■ 4.9 

OPTIMUM RECEIVER FOR CPM SIGNALS 

We recall from Section 3.3-2 that CPM is a modulation method with memory. The 
memory results from the continuity of the transmitted carrier phase from one signal 
interval to the next. The transmitted CPM signal may be expressed as 


where </>(f ; I) is the carrier phase. The filtered received signal for an additive Gaussian 
noise channel is 


tOne may have observed by now that the ML sequence detector and the symbol-by-symbol detector that 
ignores the memory in the NRZI signal reach the same decision. Hence, there is no need for a decision 
delay. Nevertheless, the procedure described above applies in general. 



(4.9-1) 


r{t) = s(t) + n{t) 


(4.9-2) 
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where 


n(t) = tij (t) cos 2nf c t — n q (t) sin 2 7tf c t 


(4.9-3) 


4.9-1 Optimum Demodulation and Detection of CPM 

The optimum receiver for this signal consists of a correlator followed by a maximum- 
likelihood sequence detector that searches the paths through the state trellis for the 
minimum Euclidean distance path. The Viterbi algorithm is an efficient method for 
performing this search. Let us establish the general state trellis structure for CPM and 
then describe the metric computations. 

Recall that the carrier phase for a CPM signal with a fixed modulation index h may 
be expressed as 


The signal pulse g(t) = 0 for t <0 and t > LT. For L = 1, we have a full response 
CPM, and for L > 1, where L is a positive integer, we have a partial response CPM 
signal. 

Now, when h is rational, i.e., h = m/ p where m and p are relatively prime positive 
integers, the CPM scheme can be represented by a trellis. In this case, there are p phase 
states 

f jtm Iran (p— \)jtm) 

®,= 0, , (4.9-6) 

l P P P ) 

when m is even, and 2 p phase states 


when m is odd. If L = 1 , these are the only states in the trellis. On the other hand, if 
L > 1, we have an additional number of states due to the partial response character 
of the signal pulse g(t). These additional states can be identified by expressing 0(t: I) 
given by Equation 4.9-4 as 


n 


4>{t\ I) = 2nh ^ hq(t — kT) 


k =— oo 


— 7 T h ^ ^ 1}^ -j - 271 h J2 hq(t - kT) 


(4.9-4) 


k = — oo k=n—L + 1 


= 6 n + 6(t; /), nT < t < (n + 1 )T 


where we have assumed that q(t) = 0 for t < 0, q(t) = | for t > LT, and 



(4.9-5) 



(4.9-7) 


n— 1 


0(t\ I) = 2nh hq(t ~ kT) + 2nhl n q(t — nT) 


k=n—L + 1 


(4.9-8) 
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The first term on the right-hand side of Equation 4.9-8 depends on the information 
symbols ( /„ _ i , /„_ 2, . . . , I n -L+ 1 ), which is called the correlative state vector, and rep- 
resents the phase term corresponding to signal pulses that have not reached their final 
value. The second term in Equation 4.9-8 represents the phase contribution due to 
the most recent symbol Hence, the state of the CPM signal (or the modulator) at 
time t = nT may be expressed as the combined phase state and correlative state, 
denoted as 


Sn = [On, In-1 , I n -2, . . . , h-L+l] (4.9-9) 

for a partial response signal pulse of length LT, where L > 1. In this case, the number 
of states is 

pM L ~ l 
2 pM L ~ l 

when h = m/ p. 

Now, suppose the state of the modulator at t = nT is S n . The effect of the new 
symbol in the time interval nT </<(«+ 1)T is to change the state from S„ to S n+ i. 
Hence, at t = {n + 1)7", the state becomes 

*^«+l = (0n+\, hi, In— l, • • • , In—L+l) 



(even m) 
(odd m) 


(4.9-10) 


where 


0n-\-\ — 0 n T 7thI n —L-\-i 

example 4.9-1. Consider a binary CPM scheme with a modulation index h — 3/4 
and a partial response pulse with L — 2. Let us determine the states S„ of the CPM 
scheme and sketch the phase tree and state trellis. 

First, we note that there are 2p = 8 phase states, namely, 

©j = { 0 , ±\n, ±j 7 r, ±j?r, 7 r} 

For each of these phase states, there are two states that result from the memory of the 
CPM scheme. Hence, the total number of states is N s = 16, namely, 

(0, 1), (0, -1), (7r, 1), (7 r, -1), (\n, l) , {\n, -l) , Qtt, l) , Qt r, -l) , 

(In, 1) , (\n, -1) , (~\n, l) , (-\n, -l) , (-\n, l) , (~\n, -l) , 
(~\n, 1), (— |w, — 1) 

If the system is in phase state 6 n = —\n and /„_] = — 1, then 


0n + 1 — 0 n T nhI n —\ 

— 3 _ _ _ 

— 4^ 4^ — ^ 

The state trellis is illustrated in Figure 4.9-1. A path through the state trellis corre- 
sponding to the sequence (1,— 1,— 1,— 1, L 1 ) is illustrated in Figure 4.9-2. 
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,) (0„ +1 ,4) FIGURE 4.9-1 



In order to sketch the phase tree, we must know the signal pulse shape g(t). Figure 
4.9-3 illustrates the phase tree when g(t) is a rectangular pulse of duration 2 T, with 
initial state (0, 1). 

Having established the state trellis representation of CPM, let us now consider the 
metric computations performed in the Viterbi algorithm. 


Metric Computations 

By referring to the mathematical development for the derivation of the maximum like- 
lihood demodulator given in Section 4.1, it is easy to show that the logarithm of the 
probability of the observed signal r(t) conditioned on a particular sequence of trans- 
mitted symbols / is proportional to the cross-correlation metric 


Mn+l)T 

CM n (I) = / r(t) cos [co c t + 0(f; /)] dt 

J — OO 

r{n+l)T 

= CM„_i(J) + / r(t)cos[co c t + 9{t\ /) + 9 n ] dt 
J nT 


(4.9-11) 
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FIGURE 4.9-2 

A single signal path through the trellis. 


</>h;D 



FIGURE 4.9-3 

Phase tree for L = 2 partial response CPM 
with h = 
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The term CM„_i(/) represents the metrics for the surviving sequences up to time n T . 
and the term 


represents the additional increments to the metrics contributed by the signal in the 
time interval nT < t < (n + 1)7'. Note that there are M L possible sequences 
/ = (/„, I n -\, . . . , ) of symbols and p (or 2 p) possible phase states {8„}. There- 

fore, there are pM L (or 2 p M L ) different values of v„ ( / . 8„) computed in each sig- 
nal interval, and each value is used to increment the metrics corresponding to the 
pM L ~ l surviving sequences from the previous signaling interval. A general block di- 
agram that illustrates the computations of v„ ( / ; 0„) for the Viterbi decoder is shown in 
Figure 4.9-4. 

Note that the number of surviving sequences at each state of the Viterbi decod- 
ing process is pM L ~ l (or 2 pM L ~ l ). For each surviving sequence, we have M new 
increments of i>„(/; 6„) that are added to the existing metrics to yield pM L (or 2 pM L ) 
sequences with pM L (or 2 pM L ) metrics. Flowever, this number is then reduced back 
to pM L ~ l (or 2pM L ] ) survivors with corresponding metrics by selecting the most 
probable sequence of the M sequences merging at each node of the trellis and discarding 
the other M — 1 sequences. 


4.9-2 Performance of CPM Signals 

In evaluating the performance of CPM signals achieved with maximum-likelihood 
sequence detection, we must determine the minimum Euclidean distance of paths 
through the trellis that separate at the node at t = 0 and remerge at a later time at 
the same node. The distance between two paths through the trellis is related to the 
corresponding signals as we now demonstrate. 

Suppose that we have two signals ,v;(?) and Sj(t) corresponding to two phase 
trajectories cp(P, /,) and 0(f; I j). The sequences /, and / , must be different in their 
first symbol. Then, the Euclidean distance between the two signals over an interval of 



(4.9-12) 



FIGURE 4.9-4 


Computation of metric increments 
v„(I; 0 n ). 
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length NT, where 1 / T is the symbol rate, is defined as 

/•NT 


4 = 


'0 


[Sj(t) - Sj(t)] 2 dt 


/o 


NT rNT 

sf(t)dt + J sj(t)dt — 2 


/•NT 


Si(t)Sj(t ) dt 


= 2 NS - 2 


28 f NT 

— / cos[m c l + 0(1; /,)] cos[m c l + 0(l; / 7 )] dt (4.9-13) 
T Jo 


= 2 N£ 

T . 


r NT 


cos[0(l; /,) - 0(1; /,)] Jf 


2£ l'*' 7 ’ 

= — / {1 - cos[0(l; /,) - 0(f; /;)]} 

r JO 

Hence the Euclidean distance is related to the phase difference between the paths in the 
state trellis according to Equation 4.9-13. 

It is desirable to express the distance dfj in terms of the bit energy. Since 8 = 
8b log 2 M, Equation 4.9-13 may be expressed as 

4 = 2 etfj 


(4.9-14) 


where 5? is defined as 

S 2 = log 2 M 


/•NT 


{1 — cos[0(l; /,) — 0(l; Ij)]}dt 


(4.9-15) 


Furthermore, we observe that 0(1; /,) — 0(1; / ; ) = 0(1; /, — / ; ), so that, with § = 
l\ — I j, Equation 4.9-15 may be written as 


S 2 l0 g2 M 

4 = 


r NT 


[1 — cos 0(l; £)] dt 


(4.9-16) 


where any element of t- can take the values 0, ±2, ±4, . . . , ±2 (M — 1), except that 

£q#0. 

The error rate performances for CPM is dominated by the term corresponding to 
the minimum Euclidean distance, and it may be expressed as 


Pm = K, hmn (Q 


'V8/ 

No 


b s2 


(4.9-17) 


where is the number of paths having the minimum distance 

4in = J im min 4 

N^-oo i,j J 

flog 0 M r NT 

= lim min < — - — / [1 — cos 0(1; /, —/, )] dt 

V-s-oo i,j { T Jo 

We note that for conventional binary PSK with no memory, N = 1 and in = 
Sj 2 = 2. Hence, Equation 4.9-17 agrees with our previous result. 

Since 5^ in characterizes the performance of CPM, we can investigate the effect on 
<5^ in resulting from varying the alphabet size M , the modulation index h, and the length 
of the transmitted pulse in partial response CPM. 


(4.9-18) 
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First, we consider full response (L = 1) CPM. If we take M = 2 as a beginning, 
we note that the sequences 


Ij = + 1 ,- 1 ,1 2 ,h 

Ij = -l,+l,/ 2 ,/ 3 


(4.9-19) 


which differ for k = 0, 1 and agree for k > 2, result in two phase trajectories that merge 
after the second symbol. This corresponds to the difference sequence 


$ = { 2 ,- 2 , 0 , 0 ,...} 


(4.9-20) 


The Euclidean distance for this sequence is easily calculated from Equation 4.9-16, 
and provides an upper bound on <5^ in . This upper bound for CPFSK with M = 2 is 


4(h) = 2 



sin 2nh\ 
2nh ) 


M = 2 


(4.9-21) 


For example, where h = j, which corresponds to MSK, we have d],(\) = 2, so that 

G) < 2. 

For M > 2 and full response CPM. it is also easily seen that phase trajectories 
merge at t = 2 T . Hence, an upper bound on <5^ lin can be obtained by considering the 
phase difference sequence $ = {a, —a, 0, 0, . . .} where a = ±2, ±4, . . . , ±2 (M — 1). 
This sequence yields the upper bound for M- ary CPFSK as 

0 f / sin2k7r/2 \ ) 

d ‘ (ll) = ,<S-, { (2l0fe M) (‘ - ^mt)} (49 - 22) 


The graphs of dg(/i) versus h for M = 2, 4, 8, 16 are shown in Figure 4.9-5. 
It is apparent from these graphs that large gains in performance can be achieved by 
increasing the alphabet size M. It must be remembered, however, that <5^ m (/i) < dj, t (h). 
That is, the upper bound may not be achievable for all values of h. 



FIGURE 4.9-5 

The upper bound as a function of the 
modulation index h for full response CPM 
with rectangular pulses. [From Aulin and 
Sundberg (1984). © 1984 John Wiley Ltd. 
Reprinted with permission of the publisher. ] 
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The minimum Euclidean distance <5„ lin (/i) has been determined, by evaluating 
Equation 4.9-16, for a variety of CPM signals by Aulin and Sundberg (1981). For 
example, Figure 4.9-6 illustrates the dependence of the Euclidean distance for binary 
CPFSK as a function of the modulation index h, with the number N of bit obser- 
vation (decision) intervals ( N = 1, 2, 3, 4) as a parameter. Also shown is the upper 
bound dg(h) given by Equation 4.9-21. In particular, we note that when h = , 

^min ( 2 ) = 2, which is the same squared distance as PSK (binary or quaternary) with 
N = 1. On the other hand, the required observation interval for MSK is N = 2 
intervals, for which we have <5^ in (2) = 2. Hence, the performance of MSK with a 
Viterbi detector is comparable to (binary or quaternary) PSK as we have previously 
observed. 

We also note from Figure 4.9-6 that the optimum modulation index for binary 
CPFSK is h = 0.7 15 when the observation interval is N = 3. This yields (0.715) = 
2.43, or a gain of 0.85 dB relative to MSK. 

Figure 4.9-7 illustrates the Euclidean distance as a function of h for M = 4 CPFSK, 
with the length of the observation interval A as a parameter. Also shown (as a dashed 
line where it is not reached) is the upper bound djj evaluated from Equation 4.9-22. 
Note that <5“ lin achieves the upper bound for several values of h for some N . In particular, 
note that the maximum value of d\, which occurs at h ~ 0.9, is approximately reached 
for N = 8 observed symbol intervals. The true maximum is achieved at h = 0.914 
with N = 9. For this case, <5^ in (0.914) = 4.2, which represents a 3.2-dB gain over 
MSK. Also note that the Euclidean distance contains minima at h = j, |, 1, etc. 
These values of h are called weak modulation indices and should be avoided. Similar 
results are available for larger values of M and may be found in the paper by Aulin and 
Sundberg (1981) and the text by Anderson et al. (1986). 

FIGURE 4.9-6 

Squared minimum Euclidean distance as a function 
of the modulation index for binary CPFSK. The 
upper bound is . [From Aulin and Sundberg 
(1981), © 1981 IEEE.] 


d\h) 



Chapter Four: Optimum Receivers for AW GN Channels 


255 


d\h ) 



FIGURE 4.9-7 

Squared minimum Euclidean distance as a 
function of the modulation index for 
quaternary CPFSK. The upper bound is d 2 B . 
[From Aulin and Sundberg (1981), © 1981 
IEEE.] 


Large performance gains can also be achieved with maximum-likelihood sequence 
detection of CPM by using partial response signals. For example, the distance bound 
d B (h) for partial response, raised cosine pulses given by 


g(t) = 


1 ( 2jt t \ 

1 — cos 

2 LT V 2 LT ) 

0 


0 <t <LT 
otherwise 


(4.9-23) 


is shown in Figure 4.9-8 for M = 2. Flere, note that, as L increases, dj, also achieves 
higher values. Clearly, the performance of CPM improves as the correlative memory 
L increases, but h must also be increased in order to achieve the larger values of d\. 
Since a larger modulation index implies a larger bandwidth (for fixed L), while a larger 
memory length L (for fixed h) implies a smaller bandwidth, it is better to compare the 
Euclidean distance as a function of the normalized bandwidth 2WTj,, where W is the 99 
percent power bandwidth and I), is the bit interval. Figure 4.9-9 illustrates this type 
of comparison with MSK used as a point of reference (0 dB). Note from this figure 
that there are several decibels to be gained by using partial response signals and higher 
signaling alphabets. The major price to be paid for this performance gain is the added 
exponentially increasing complexity in the implementation of the Viterbi detector. 
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FIGURE 4.9-8 

Upper bound on the minimum 
distance for partial response (raised 
cosine pulse) binary CPM. [ From 
Sundberg (1986), © 1986 IEEE.] 



FIGURE 4.9-9 

Power bandwidth tradeoff for partial 
response CPM signals with raised cosine 
pulses. W is the 99 percent inband power 
bandwidth. [From Sundberg (1986), © 
1986 IEEE.] 
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The performance results shown in Figure 4.9-9 illustrate that a 3-4 dB gain relative 
to MSK can be easily obtained with relatively no increase in bandwidth by the use of 
raised cosine partial response CPM and M = 4. Although these results are for raised 
cosine signal pulses, similar gains can be achieved with other partial response pulse 
shapes. We emphasize that this gain in SNR is achieved by introducing memory into 
the signal modulation and exploiting the memory in the demodulation of the signal. No 
redundancy through coding has been introduced. In effect, the code has been built into 
the modulation and the trellis-type (Viterbi) decoding exploits the phase constraints in 
the CPM signal. 

Additional gains in performance can be achieved by introducing additional redun- 
dancy through coding and increasing the alphabet size as a means of maintaining a fixed 
bandwidth. In particular, trellis-coded CPM using relatively simple convolution codes 
has been thoroughly investigated and many results are available in the technical litera- 
ture. The Viterbi decoder for the convolutionally encoded CPM signal now exploits the 
memory inherent in the code and in the CPM signal. Performance gains of the order of 
4—6 dB, relative to uncoded MSK with the same bandwidth, have been demonstrated 
by combining convolutional coding with CPM. Extensive numerical results for coded 
CPM are given by Lindell (1985). 

Multi-/t CPM 

By varying the modulation index from one signaling interval to another, it is possible 
to increase the minimum Euclidean distance <5^ in between pairs of phase trajectories 
and, thus, improve the performance gain over constant-/? CPM. Usually, multi-/i CPM 
employs a fixed number H of modulation indices that are varied cyclically in successive 
signaling intervals. Thus, the phase of the signal varies piecewise linearly. 

Significant gains in SNR are achievable by using only a small number of different 
values of h. For example, with full response (L = 1) CPM and H = 2, it is possible to 
obtain a gain of 3 dB relative to binary or quaternary PSK. By increasing H to H = 4, 
a gain of 4.5 dB relative to PSK can be obtained. The performance gain can also be 
increased with an increase in the signal alphabet. Table 4.9-1 lists the performance 


TABLE 4.9-1 

Maximum Values of the Upper Bound d\ for Multi-/i Linear Phase CPM a 


M 

H 

Max dg 

dB gain 
compared 
with MSK 

h\ 

h 2 

hi 

Il4 

h 

2 

1 

2.43 

0.85 

0.715 




0.715 

2 

2 

4.0 

3.0 

0.5 

0.5 



0.5 

2 

3 

4.88 

3.87 

0.620 

0.686 

0.714 


0.673 

2 

4 

5.69 

4.54 

0.73 

0.55 

0.73 

0.55 

0.64 

4 

1 

4.23 

3.25 

0.914 




0.914 

4 

2 

6.54 

5.15 

0.772 

0.772 



0.772 

4 

3 

7.65 

5.83 

0.795 

0.795 

0.795 


0.795 

8 

1 

6.14 

4.87 

0.964 




0.964 

8 

2 

7.50 

5.74 

0.883 

0.883 



0.883 

8 

3 

8.40 

6.23 

0.879 

0.879 

0.879 


0.879 


'From Aulin and Sundberg (1982b). 
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FIGURE 4.9-10 

Upper bounds on minimum squared 
Euclidean distance for various M and H 
values. [From Aulin and Sundberg 
(1982b), © 1982 IEEE.] 


gains achieved with M = 2, 4, and 8 for several values of H. The upper bounds on the 
minimum Euclidean distance are also shown in Figure 4.9-10 for several values of M 
and H. Note that the major gain in performance is obtained when H is increased from 
H = 1 to H = 2. For H > 2, the additional gain is relatively small for small values of 
{hj}. On the other hand, significant performance gains are achieved by increasing the 
alphabet size M. 

The results shown above hold for full response CPM. One can also extend the 
use of multi-/7 CPM to partial response in an attempt to further improve performance. 
It is anticipated that such schemes will yield some additional performance gains, but 
numerical results on partial response, multi-/z CPM are limited. The interested reader 
is referred to the paper by Aulin and Sundberg (1982b). 


4.9-3 Suboptimum Demodulation and Detection of CPM Signals 

The high complexity inherent in the implementation of the maximum-likelihood 
sequence detector for CPM signals has been a motivating factor in the investigation of 
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reduced-complexity detectors. Reduced-complexity Viterbi detectors were investigated 
by Svensson (1984), Svensson et al. (1984), Svensson and Sundberg (1983), Aulin et 
al. (1981), Simmons and Wittke (1983), Palenius and Svensson (1993), and Palenius 
(1991). The basic idea in achieving a reduced-complexity Viterbi detector is to design a 
receiver filter that has a shorter pulse than the transmitter. The receiver pulse gR(t) must 
be chosen in such a way that the phase tree generated by gR(t) is a good approximation 
of the phase tree generated by the transmitter pulse gr(t). Performance results indicate 
that a significant reduction in complexity can be achieved at a loss in performance of 
about 0.5 to 1 dB. 

Another method for reducing the complexity of the receiver for CPM signals is to 
exploit the linear representation of CPM, which can be expressed as a sum of amplitude- 
modulated pulses as given in the papers by Laurent (1986) and Mengali and Morelli 
(1995). In many cases of practical interest the CPM signal can be approximated by a 
single amplitude-modulated pulse or, perhaps, by a sum of two amplitude-modulated 
pulses. Hence, the receiver can be easily implemented based on this linear representa- 
tion of the CPM signal. The performance of such relatively simple receivers has been 
investigated by Kawas-Kaleh (1989). The results of this study indicate that such sim- 
plified receivers sacrifice little in performance but achieve a significant reduction in 
implementation complexity. 


■ 4.10 

PERFORMANCE ANALYSIS FOR WIRELINE AND RADIO 
COMMUNICATION SYSTEMS 

In the transmission of digital signals through an AWGN channel, we have observed that 
the performance of the communication system, measured in terms of the probability of 
error, depends solely on the received SNR, Sb/No, where £/, is the transmitted energy 
per bit and t No is the power spectral density of the additive noise. Hence, the additive 
noise ultimately limits the performance of the communication system. 

In addition to the additive noise, another factor that affects the performance of a 
communication system is channel attenuation. All physical channels, including wire 
lines and radio channels, are lossy. Hence, the signal is attenuated as it travels through 
the channel. The simple mathematical model for the attenuation shown in Figure 4. 10-1 
may be used for the channel. Consequently, if the transmitted signal is s(t ), the received 
signal, with 0 < a < 1 is 


r(f) = as(t) + n(t ) 


(4.10-1) 


Transmitted 




Received 

signal 


signal 

4‘) 




r(t) = as(t ) + 


FIGURE 4.10-1 

Mathematical model of channel with 
attenuation and additive noise. 


Attenuation 

a 


Noise 

n(t) 
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Then, if the energy in the transmitted signal is £b, the energy in the received signal 
is a 2 £b . Consequently, the received signal has an SNR a 2 £i,/No . Hence, the effect of 
signal attenuation is to reduce the energy in the received signal and thus to render the 
communication system more vulnerable to additive noise. 

In analog communication systems, amplifiers called repeaters are used to pe- 
riodically boost the signal strength in transmission through the channel. However, 
each amplifier also boosts the noise in the system. In contrast, digital communication 
systems allow us to detect and regenerate a clean (noise-free) signal in a transmission 
channel. Such devices, called regenerative repeaters, are frequently used in wireline 
and fiber-optic communication channels. 

4.10-1 Regenerative Repeaters 

The front end of each regenerative repeater consists of a demodulator/detector that 
demodulates and detects the transmitted digital information sequence sent by the pre- 
ceding repeater. Once detected, the sequence is passed to the transmitter side of the 
repeater, which maps the sequence into signal waveforms that are transmitted to the 
next repeater. This type of repeater is called a regenerative repeater. 

Since a noise-free signal is regenerated at each repeater, the additive noise does 
not accumulate. However, when errors occur in the detector of a repeater, the errors are 
propagated forward to the following repeaters in the channel. To evaluate the effect of 
errors on the performance of the overall system, suppose that the modulation is binary 
PAM, so that the probability of a bit error for one hop (signal transmission from one 
repeater to the next repeater in the chain) is 


Since errors occur with low probability, we may ignore the probability that any one bit 
will be detected incorrectly more than once in transmission through a channel with K 
repeaters. Consequently, the number of errors will increase linearly with the number 
of regenerative repeaters in the channel, and therefore, the overall probability of error 
may be approximated as 


In contrast, the use of K analog repeaters in the channel reduces the received SNR by 
K, and hence, the bit-error probability is 




( 4 . 10 - 2 ) 



( 4 . 10 - 3 ) 


Clearly, for the same probability of error performance, the use of regenerative repeaters 
results in a significant saving in transmitter power compared with analog repeaters. 
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Hence, in digital communication systems, regenerative repeaters are preferable. How- 
ever, in wireline telephone channels that are used to transmit both analog and digital 
signals, analog repeaters are generally employed. 

example 4.10-1. A binary digital communication system transmits data over a wire- 
line channel of length 1000 km. Repeaters are used every 10 km to offset the effect of 
channel attenuation. Let us determine the £b/No that is required to achieve a proba- 
bility of a bit error of 10~ 5 if (a) analog repeaters are employed, and (b) regenerative 
repeaters are employed. 

The number of repeaters used in the system is K = 100. If regenerative repeaters 
are used, the £*/ Nq obtained from Equation 4.10-2 is 


which yields approximately 11.3 dB. If analog repeaters are used, the £/,/ No obtained 
from Equation 4.10-3 is 


which yields £b/No ~ 29.6 dB. Hence, the difference in the required SNR is about 
18.3 dB, or approximately 70 times the transmitter power of the digital communication 
system. 


4.10-2 Link Budget Analysis in Radio Communication Systems 

In the design of radio communication systems that transmit over line-of-sight 
microwave channels and satellite channels, the system designer must specify the size 
of the transmit and receive antennas, the transmitted power, and the SNR required to 
achieve a given level of performance at some desired data rate. The system design 
procedure is relatively straightforward and is outlined below. 

Let us begin with a transmit antenna that radiates isotropically in free space at a 
power level of P T watts as shown in Figure 4.10-2. The power density at a distance d 
from the antenna is Pj /And 2 W/m 2 . If the transmitting antenna has some directivity in 





FIGURE 4.10-2 


Isotropically radiating antenna. 
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a particular direction, the power density in that direction is increased by a factor called 
the antenna gain and denoted by G j. In such a case, the power density at distance d is 
P t Gr /A nd 2 W/m 2 . The product Pj G 7 is usually called the effective radiated power 
(ERP or EIRP), which is basically the radiated power relative to an isotropic antenna, 
for which Gr = 1 . 

A receiving antenna pointed in the direction of the radiated power gathers a portion 
of the power that is proportional to its cross-sectional area. Hence, the received power 
extracted by the antenna may be expressed as 


p t g t a r 

And 2 


(4.10-4) 


where A R is the effective area of the an tenna. From electromagnetic held theory, we 
obtain the basic relationship between the gain Gr of an antenna and its effective area 
as 


Ar = 


GrX 2 

An 


nr 


(4.10-5) 


where X = cl f is the wavelength of the transmitted signal, c is the speed of light 
(3 x 10 8 m/s), and / is the frequency of the transmitted signal. 

If we substitute Equation 4.10-5 for Ar into Equation 4.10-4, we obtain an 
expression for the received power in the form 


Pr GrG r 
(■ And /X ) 2 


(4.10-6) 


The factor 


L s = 



(4.10-7) 


is called the free-space path loss. If other losses, such as atmospheric losses, are 
encountered in the transmission of the signal, they may be accounted for by intro- 
ducing an additional loss factor, say L a . Therefore, the received power may be written 
in general as 


Pr = P T G T G R L s L a 


(4.10-8) 


As indicated above, the important characteristics of an antenna are its gain and its 
effective area. These generally depend on the wavelength of the radiated power and 
the physical dimensions of the antenna. For example, a parabolic (dish) antenna of 
diameter D has an effective area 

Ar = \n D 2 r) (4.10-9) 

where \n D 2 is the physical area and ij is the illumination efficiency factor, which falls 
in the range 0.5 < )? < 0.6. Hence, the antenna gain for a parabolic antenna of diameter 
D is 


Gr = rj 


nD 


2 


(4.10-10) 
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FIGURE 4.10-3 

Antenna beamwidth and pattern. 


As a second example, a horn antenna of physical area A has an efficiency factor of 0.8, 
an effective area of Ar = 0.8 A, and an antenna gain of 


G r = 


10A 


(4.10-11) 


Another parameter that is related to the gain (directivity) of an antenna is its 
beamwidth, which we denote as 0 b and which is illustrated graphically in Figure 
4. 10-3. Usually, the beamwidth is measured as the — 3 dB width of the antenna pattern. 
For example, the — 3 dB beamwidth of a parabolic antenna is approximately 


0 b =7O(V£>)° (4.10-12) 

so that Gj is inversely proportional to 0|. That is, a decrease of the beamwidth by a 
factor of 2, which is obtained by doubling the diameter D, increases the antenna gain 
by a factor of 4 (6 dB). 

Based on the general relationship for the received signal power given by Equation 
4.10-8, the system designer can compute Pr from a specification of the antenna gains 
and the distance between the transmitter and the receiver. Such computations are usually 
done on a power basis, so that 


(Pr) dB — (^V)dB + (Gr)dB + (Gr) dB + (Uv)dB + (^n)dB (4.10-13) 


example 4.10-2. Suppose that we have a satellite in geosynchronous orbit (36,000 
km above the earth’s surface) that radiates 100 W of power, i.e., 20 dB above 1 W (20 
dBW). The transmit antenna has a gain of 17 dB, so that the ERP = 37 dBW. Also, 
suppose that the earth station employs a 3-m parabolic antenna and that the downlink 
is operating at a frequency of 4 GHz. The efficiency factor is = 0.5. By substituting 
these numbers into Equation 4.10-10, we obtain the value of the antenna gain as 39 dB. 
The free-space path loss is 

L s = 195.6 dB 

No other losses are assumed. Therefore, the received signal power is 
(Pr) d B = 20+ 17 + 39- 195.6 
= -119.6 dBW 
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or, equivalently. 


P R = 1.1 x 10“ 12 W 


To complete the link budget computation, we must consider the effect of the 
additive noise at the receiver front end. Thermal noise that arises at the receiver front 
end has a relatively flat power density spectrum up to about 10 12 Hz, and is given as 


N 0 = k B T 0 W/Hz (4.10-14) 

where k B is Boltzmann’s constant ( 1 .38 x 1 0 22 W-s/K) and 7b is the noise temperature 
in Kelvin. Therefore, the total noise power in the signal bandwidth W is NqW. 

The performance of the digital communication system is specified by the £b/No 
required to keep the error rate performance below some given value. Since 


it follows that 


£b_ = TVPr = }_Pr 
N 0 N 0 R Nq 



(4.10-15) 


(4.10-16) 


where (£b / No) req is the required SNR per bit. Hence, if we have P R / Nq and the required 
SNR per bit, we can determine the maximum data rate that is possible. 


example 4.10-3. For the link considered in Example 4.10-2, the received signal power 
is 


P R = 1.1 x 10~ 12 W (-119.6 dBW) 

Now, suppose the receiver front end has a noise temperature of 300 K, which is typical 
for a receiver in the 4-GHz range. Then 

Nq = 4.1 x 10" 21 W/Hz 

or, equivalently, —203.9 dBW/Hz. Therefore, 


— = — 1 19.6 + 203.9 = 84.3 dB Hz 
N 0 

If the required SNR per bit is 1 0 dB , then, from Equation 4. 1 0- 1 6, we have the available 
rate as 


Rob = 84.3 - 10 

= 74.3 dB (with respect to 1 bit/s) 

This corresponds to a rate of 26.9 megabits/s, which is equivalent to about 420 PCM 
channels, each operating at 64,000 bits/s. 

It is a good idea to introduce some safety margin, which we shall call the link 
margin Mob, in the above computations for the capacity of the communication link. 
Typically, this may be selected as Mob = 6 dB. Then, the link budget computation for 
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the link capacity may be expressed in the simple form 



(-Pr)dBW + (GrldB + (Gtf)dB 


(4.10-17) 


+ (£«)dB + (^j)dB — (AWdBW/Hz 



■ 4.11 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

In the derivation of the optimum demodulator for a signal corrupted by AWGN, we 
applied mathematical techniques that were originally used in deriving optimum receiver 
structures for radar signals. For example, the matched filter was first proposed by 
North (1943) for use in radar detection, and it is sometimes called the North filter. An 
alternative method for deriving the optimum demodulator and detector is the Karhunen- 
Loeve expansion, which is described in the classical texts by Davenport and Root 
(1958), Helstrom (1968), and Van Trees (1968). Its use in radar detection theory is 
described in the paper by Kelly et al. (1960). These detection methods are based on 
the hypothesis testing methods developed by statisticians, e.g., Neyman and Pearson 
(1933) and Wald (1947). 

The geometric approach to signal design and detection, which was presented in 
the context of digital modulation and which has its roots in Kotelnikov (1947) and 
Shannon’s original work, is conceptually appealing and is now widely used since its 
use in the text by Wozencraft and Jacobs (1965). 

Design and analysis of signal constellations for the AWGN channel have received 
considerable attention in the technical literature . Of particular significance is the perfor- 
mance analysis of two-dimensional (QAM) signal constellations that has been treated 
in the papers of Cahn (1960), Hancock and Lucky (1960), Campopiano and Glazer 
(1962), Lucky and Hancock (1962), Salz et al. (1971), Simon and Smith (1973), 
Thomas et al. (1974), and Foschini et al. (1974). Signal design based on multidimen- 
sional signal constellations has been described and analyzed in the paper by Gersho 
and Lawrence (1984). 

The Viterbi algorithm was devised by Viterbi (1967) for the purpose of decod- 
ing convolutional codes. Its use as the optimal maximum-likelihood sequence detec- 
tion algorithm for signals with memory was described by Forney (1972) and Omura 
(1971). Its use for carrier modulated signals was considered by Ungerboeck (1974) and 
MacKenchnie (1973). It was subsequently applied to the demodulation of CPM by 
Aulin and Sundberg (1981), Aulin et al. (1981), and Aulin (1980). 

Our discussion of the demodulation and detection of signals with memory refer- 
enced journal papers published primarily in the United States. The authors have recently 
learned that maximum-likelihood sequential detection algorithms for signals with mem- 
ory (introduced by the channel through intersymbol interference) were also developed 
and published in Russia during the 1960s by D. Klovsky. An English translation of 
Klovsky’s work is contained in his book coauthored with B. Nikolaev (1978). 
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PROBLEMS 


4.1 Let Z(t) = X(t) + jY(t ) be a complex-valued, zero-mean white Gaussian noise process 
with autocorrelation function R z ( r) = NqS(t). Let f m (t), m = 1,2,..., M, be a set of 
M orthogonal equivalent lowpass waveforms defined on the interval 0 < t < T . Define 


N mr = Re 



Z(t)f*(t)dt 


m — 1,2,..., M 


1 . Determine the variance of N mr . 

2. Show that E| N mr N^ r \ = 0 for k m. 


4.2 The correlation metrics given by Equation 4.2-28 are 


C(r, 5,,,) — 2 ^ ' F n s m ii ^ ^ ttz — 1,2,..., M 


where 


and 


n = 1 n = 1 


r„ = / r{t)<t> n (t)dt 
Jo 


Smn — / *Pn(j') dt 

Jo 


Show that the correlation metrics are equivalent to the metrics 


C(r,s,„) = 2 r(t) s m (t) dt — / sl(t)dt 


4.3 In the communication system shown in Figure P4.3, the receiver receives two signals r\ 
and r 2 , where r 2 is a “noisier” version of r\. The two noises n\ and «2 are arbitrary — 
not necessarily Gaussian, and not necessarily independent. Intuition would suggest that 
since r 2 is noisier than r\, the optimal decision can be based only on n; in other words, 
r 2 is irrelevant. Is this true or false? If it is true, give a proof; if it is false, provide a 
counterexample and state under what conditions this can be true. 



FIGURE P4.3 


Chapter Four: Optimum Receivers for AW GN Channels 


267 


4.4 A binary digital communication system employs the signals 

so(t) = 0 0 < t < T 

si(f) = A 0 < t < T 

for transmitting the information. This is called on-off signaling. The demodulator cross- 
correlates the received signal r(t) with s(f) and samples the output of the correlator at 
t + T. 

a. Determine the optimum detector for an AWGN channel and the optimum threshold, 
assuming that the signals are equally probable. 

b. Determine the probability of error as a function of the SNR. How does on-off signaling 
compare with antipodal signaling? 

4.5 A communication system transmits one of the three messages m \ , m 2 , and m 3 using signals 
SiO), 52 (f), and 53 (f). The signal 53 (f) = 0, and 5i(f) and 52 (f) are shown in Figure P4.5. 
The channel is an additive white Gaussian noise channel with noise power spectral density 
equal to No/ 2. 




FIGURE P4.5 

1 . Determine an orthonormal basis for this signal set, and depict the signal constellation. 

2. If the three messages are equiprobable, what are the optimal decision rules for this 
system? Show the optimal decision regions on the signal constellation you plotted in 
part 1. 

3. If the signals are equiprobable, express the error probability of the optimal detector in 
terms of the average SNR per bit. 

4. Assuming this system transmits 3000 symbols per second, what is the resulting trans- 
mission rate (in bits per second)? 

4.6 Suppose that binary PSK is used for transmitting information over an AWGN with a power 

spectral density of \Nq = 10“ 10 W/Hz. The transmitted signal energy is £b = \A 2 T , 

where T is the bit interval and A is the signal amplitude. Determine the signal amplitude 

required to achieve an error probability of 10 -6 when the data rate is 

1. 10 kilobits/s 

2. 100 kilobits/s 

3. 1 megabit/s 

4.7 Consider a signal detector with an input 


r = ±A + n 
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where +A and —A occur with equal probability and the noise variable n is characterized 
by the (Laplacian) PDF shown in Figure P4.7. 

1 . Determine the probability of error as a function of the parameters A and a . 

2. Determine the SNR required to achieve an error probability of 1 0 -5 . How does the SNR 
compare with the result for a Gaussian PDF? 



n 


FIGURE P4.7 


4.8 The signal constellation for a communication system with 16 equiprobable symbols is 
shown in Figure P4.8. The channel is AWGN with noise power spectral density of Nq/2. 
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FIGURE P4.8 

1. Using the union bound, find a bound in terms of A and No on the error probability for 
this channel. 

2. Determine the average SNR per bit for this channel. 

3. Express the bound found in part 1 in terms of the average SNR per bit. 

4. Compare the power efficiency of this system with a 16-level PAM system. 

4.9 A ternary communication system transmits one of three equiprobable signals s(r), 0, 
or —s(t) every T seconds. The received signal is r/(f) = s(t) + z(t), r/(t) = z(t), 
or nit) = —s(t) + z(t), where z(t) is white Gaussian noise with E[z(f )] = 0 and 
R z ( r) = E[z(f)z*(r)] = 2NoS(t — r). The optimum receiver computes the correlation 
metric 

T 

r/(t)s*(t)dt 



U =Re 
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and compares U with a threshold A and a threshold — A. If U > A, the decision is made 
that s(t) was sent. If U < —A, the decision is made in favor of —s(t). If —A < U < A, 
the decision is made in favor of 0. 

1. Determine the three conditional probabilities of error: P e given that s(t ) was sent, P e 
given that — s(t) was sent, and P e given that 0 was sent. 

2. Determine the average probability of error P e as a function of the threshold A, assuming 
that the three symbols are equally probable a priori. 

3. Determine the value of A that minimizes P e . 


4.10 The two equivalent lowpass signals shown in Figure P4.10 are used to transmit a binary 
information sequence. The transmitted signals, which are equally probable, are corrupted 
by additive zero-mean white Gaussian noise having an equivalent lowpass representation 
z(t) with an autocorrelation function 

R z (r) = E [z*(t)z(t + r)] = 2N 0 S(r) 

1. What is the transmitted signal energy? 

2. What is the probability of a binary digit error if coherent detection is employed at the 
receiver? 

3. What is the probability of a binary digit error if noncoherent detection is employed at 
the receiver? 


*i(9. 

A 


-A - 



FIGURE P4.10 


4.11 A matched filter has the frequency response 


H(f) = 


1 - e- j2ltfT 

jZnf 


1. Determine the impulse response h{t) corresponding to H(f). 

2. Determine the signal waveform to which the filter characteristic is matched. 


4.12 Consider the signal 


s(t) 


(A/T)t cos2jr/ c f 0 <t<T 

0 otherwise 


1. Determine the impulse response of the matched filter for the signal. 

2. Determine the output of the matched filter at t = T . 

3. Suppose the signal s(t) is passed through a correlator that correlates the input s (t) with 
s(t). Determine the value of the correlator output at / = T . Compare your result with 
that in part 2. 


4.13 The two equivalent lowpass signals shown in Figure P4.13 are used to transmit a bi- 
nary sequence over an additive white Gaussian noise channel. The received signal can be 
expressed as 


r,{t) = Si(t) + z(t ), 


0 < t < T, 


i = 1,2 
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where z(t) is a zero-mean Gaussian noise process with autocorrelation function 
Rz( r) = E [z*(t)z(t + r)] = 2Nq8(t) 

1. Determine the transmitted energy in (r) and s 2 it) and the cross-correlation coeffi- 
cient pi 2 . 

2. Suppose the receiver is implemented by means of coherent detection using two matched 
filters, one matched to si (f ) and the other to s 2 it). Sketch the equivalent lowpass impulse 
responses of the matched filters. 


*i(0. 


■ 5 - 2 ( 0 . 



FIGURE P4.13 


3. Sketch the noise-free response of the two matched filters when the transmitted signal 
is S2(t). 

4. Suppose the receiver is implemented by means of two cross-correlators (multipliers 
followed by integrators) in parallel. Sketch the output of each integrator as a function 
of time for the interval 0 < t < T when the transmitted signal is 5i(0- 

5. Compare the sketches in parts 3 and 4. Are they the same? Explain briefly. 

6. From your knowledge of the signal characteristics, give the probability of error for this 
binary communication system. 

4.14 A binary communication system uses two equiprobable messages si (r) = pit) and s 2 it) = 

—pit). The channel noise is additive white Gaussian with power spectral density Nq/2. 

Assume that we have designed an optimal receiver for this channel, and let the error 

probability for the optimal receiver be P e . 

1 . Find an expression for P e . 

2. If this receiver is used on an AWGN channel using the same signals but with the noise 
power spectral density N\ > No, find the resulting error probability Pi and explain how 
its value compares with P e . 

3. Let P e i denote the error probability in part 2 when an optimal receiver is designed for 
the new noise power spectral density N\ . Find P e \ and compare it with P\. 

4. Answer parts 1 and 2 if the two signals are not equiprobable but have prior probabilities 
p and 1 — p. 


4.15 Consider a quaternary (M = 4) communication system that transmits, every T seconds, 
one of four equally probable signals: ii(r), — si(f), s 2 it), —S2(t). The signals Ji(t) and 
S 2 (t) are orthogonal with equal energy. The additive noise is white Gaussian with zero 
mean and autocorrelation function R z ( r) = Nq/28(t). The demodulator consists of two 
filters matched to si(r) and s 2 it), and their outputs at the sampling instant are U\ and U 2 . 
The detector bases its decision on the following rule: 

Ui > \U 2 \ =F si(t) Ui < -\U 2 \ => -siit) 

U 2 >\Ui\^ s 2 it) U 2 <-\Ui\=> -s 2 {t) 

Since the signal set is biorthogonal, the error probability is given by (1 — P c ), where 
P c is given by Equation 4.4-26. Express this error probability in terms of a single inte- 
gral, and thus show that the symbol error probability for a biorthogonal signal set with 
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M = 4 is identical to that for four-phase PSK. Hint. A change in variables from U \ and 
U 2 to Wi = U\ + U 2 and W 2 = U\ — U 2 simplifies the problem. 


4.16 The input s(t) to a bandpass filter is 

j(O = Re[j 0 (f)^ 2T/ ' f ] 

where so(t) is a rectangular pulse as shown in Figure P4. 16(a). 

1. Determine the output y{t) of the bandpass filter for all t > 0 if the impulse response 
of the filter is 

g(t) = Re [h(t)e j2 * fc '] 

where h(t ) is an exponential as shown in Figure P4. 16(b). 

2. Sketch the equivalent lowpass output of the filter. 

3. When would you sample the output of the filter if you wished to have the maximum 
output at the sampling instant? What is the value of the maximum output? 

4. Suppose that in addition to the input signal s(t), there is additive white Gaussian noise 

n(t ) = Re [. z(t)e j2nfct ] 

where R z (r ) = 2Nq8(t). At the sampling instant determined in part 3, the signal sample 
is corrupted by an additive Gaussian noise term. Determine its mean and variance. 

5. What is the signal-to-noise ratio y of the sampled output? 

6. Determine the signal-to-noise ratio when h{t) is the matched filter to s(f), and compare 
this result with the value of y obtained in part 5. 



FIGURE P4.16 


4.17 Consider the equivalent lowpass (complex-valued) signal s/(t), 0 < t < T , with energy 

£= [ \si(t)\ 2 dt 

Jo 

Suppose that this signal is corrupted by AWGN, which is represented by its equivalent 
lowpass form z(t). Hence, the observed signal is 

r,(t) = s,(t ) + z(t), 0 < t < T 

The received signal is passed through a filter that has an (equivalent lowpass) impulse 
response hi(t). Determine hi(t) so that the filter maximizes the SNR at its output (at 
t = T). 

4.18 In Section 3.2-4 it was shown that the minimum frequency separation for orthogonality 
of binary FSK signals with coherent detection is A / = 1/27? However, a lower error 
probability is possible with coherent detection of FSK if A / is increased beyond 1/277 
Show that the optimum value of A / is 0.715/ T, and determine the probability of error for 
this value of A/. 

4.19 The equivalent lowpass waveforms for three signal sets are shown in Figure P4.19. Each 
set may be used to transmit one of four equally probable messages over an additive white 
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Gaussian noise channel. The equivalent lowpass noise z(t) has zero-mean and autocorre- 
lation function R z ( r) = 2Nq8(t). 

1 . Classify the signal waveforms in sets I, II, III. In other words, state the category or class 
to which each signal set belongs. 

2. What is the average transmitted energy for each signal set? 

3. For signal set I, specify the average probability of error if the signals are detected 
coherently. 

4. For signal set II, give a union bound on the probability of a symbol error if the detection 
is performed (i) coherently and (ii) noncoherently. 

5. Is it possible to use noncoherent detection on signal set III? Explain. 

6. Which signal set or signal sets would you select if you wished to achieve a spectral bit 
rate (r = R/W ) of at least 2? Explain your answer. 
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4.20 For the QAM signal constellation shown in Figure P4.20, determine the optimum decision 
boundaries for the detector, assuming that the SNR is sufficiently high that errors occur 
only between adjacent points. 



FIGURE P4.20 
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4.21 Two quadrature carriers cos2jrf c t and sin2n f c t are used to transmit digital information 
through an AWGN channel at two different data rates, 10 kilobits/s and 100 kilobits/s. 
Determine the relative amplitudes of the signals for the two carriers so that £*,/ No for the 
two channels is identical. 

4.22 When the additive noise at the input to the demodulator is colored, the filter matched 
to the signal no longer maximizes the output SNR. In such a case we may consider the 
use of a prefilter that “whitens” the colored noise. The prefilter is followed by a filter 
matched to the prefiltered signal. Toward this end, consider the configuration shown in 
Figure P4.22. 

1. Determine the frequency response characteristic of the prefilter that whitens the noise, 
in terms of £„(/), the noise power spectral density. 

2. Determine the frequency response characteristic of the filter matched to s(t). 

3. Consider the prefilter and the matched filter as a single “generalized matched filter.” 
What is the frequency response characteristic of this filter? 

4. Determine the SNR at the input to the detector. 



FIGURE P4.22 


4.23 Consider a digital communication system that transmits information via QAM over a 
voice-band telephone channel at a rate of 2400 symbols/s. The additive noise is assumed 
to be white and Gaussian. 

1. Determine the £b/No required to achieve an error probability of 10 -5 at 4800 bits/s. 

2. Repeat part 1 for a rate of 9600 bits/s. 

3. Repeat part 1 for a rate of 19,200 bits/s. 

4. What conclusions do you reach from these results? 


4.24 Three equiprobable messages mi, m 2 , and m3 are to be transmitted over an AWGN channel 
with noise power spectral density ^ No- The messages are 


si(f) 


1 0 <t <T 

0 otherwise 


*2 (0 = - 53(0 = 


1 

-1 

0 


0 <t <\t 
\T <t<T 

otherwise 


1. What is the dimensionality of the signal space? 

2. Find an appropriate basis for the signal space. 

3. Draw the signal constellation for this problem. 

4. Derive and sketch the optimal decision regions R 1, R 2 , and A 1 ?. 

5. Which of the three messages is most vulnerable to errors and why? In other words, 
which of Pferror | m, transmitted), i = 1, 2, 3, is largest? 


4.25 A QPSK communication system over an AWGN channel uses one of the four equiprobable 
signals = Acos(2nf c t + fjr/2), where i = 0, 1, 2, 3, f c is the carrier frequency, 
and the duration of each signal is T . The power spectral density of the channel noise is 
N 0 /2. 
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1. Express the message error probability of this system in terms of A, T, and No (an 
approximate expression is sufficient). 

2. If Gray coding is used, what is the bit error probability in terms of the same parameters 
used in part 1 ? 

3. What is the minimum (theoretical minimum) required transmission bandwidth for this 
communication system? 

4. If, instead of QPSK, binary FSK is used with si(r) = Bcos2nf c t and xi (t) = 
B cos(2 7tf c + A f)t where the duration of the signals is now 7) and A / = deter- 
mine the required T\ and B in terms of T and A to achieve the same bit rate and the 
same bit error probability as the QPSK system described in parts 1-3. 

4.26 A binary signaling scheme over an AWGN channel with noise power spectral density of 
^ uses the equiprobable messages shown in Figure P4.26 and is operating at a bit rate of 
R bits/s. 


*40 
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FIGURE P4.26 
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1. What is for this system (in terms of No and R)2 

2. What is the error probability for this system (in terms of No and R)7 

3. By how many decibels does this system underperform a binary antipodal signaling 
system with the same ^ ? 

4. Now assume that this system is augmented with two more signals ss(t) = — s\(t) 
and S 4 (r) = —S 2 (t) to result in a 4-ary equiprobable system. What is the resulting 
transmission bit rate? 

5. Using the union bound, find a bound on the error probability of the 4-ary system 
introduced in part 4. 

4.27 The four signals shown in Figure P4.27 are used for communication of four equiprobable 

messages over an AWGN channel. The noise power spectral density is 

1. Find an orthonormal basis, with lowest possible N, for representation of the signals. 

2. Plot the constellation, and using the constellation, find the energy in each signal. What 
is the average signal energy and what is £b avg ? 

3. On the constellation that you have plotted, determine the optimal decision regions for 
each signal, and determine which signal is more probable to be received in error. 

4. Now analytically (i.e., not geometrically) determine the shape of the decision region 
for signal si(t), i.e., D\, and compare it with your result in part 3. 


Chapter Four: Optimum Receivers for AW GN Channels 


275 


*i(0 


*i(0 


1 


1 


3 


s 2 (t) 


to 






-1 


1 3 


s 3 (0 


2 - 


1 


1 


t 


1 


1 


3 


r 
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4.28 Consider the four-phase and eight-phase signal constellations shown in Figure P4.28. 
Determine the radii r\ and r 2 of the circles such that the distance between two adjacent 
points in the two constellations is d. From this result, determine the additional transmitted 
energy required in the 8-PSK signal to achieve the same error probability as the four-phase 
signal at high SNR, where the probability of error is determined by errors in selecting 
adjacent points. 



4.29 Digital information is to be transmitted by carrier modulation through an additive Gaus- 
sian noise channel with a bandwidth of 100 kFlz and Nq = 10“ 10 W/FIz. Determine the 
maximum rate that can be transmitted through the channel for four-phase PSK, binary 
FSK, and four-frequency orthogonal FSK, which is detected noncoherently. 


4.30 A continuous-phase FSK signal with h = — is represented as 


s(t) = ± 



7Tt 

2T b 


cos 2jt f c t ± 



sin 2 jt f c t, 


0 < t < 2T b 


where the ± signs depend on the information bits transmitted. 

1. Show that this signal has constant envelope. 

2. Sketch a block diagram of the modulator for synthesizing the signal. 

3. Sketch a block diagram of the demodulator and detector for recovering the information. 
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4.31 Consider a biorthogonal signal set with M = 8 signal points. Determine a union bound 
for the probability of a symbol error as a function of £b/No . The signal points are equally 
likely a priori. 

4.32 Consider an M- ary digital communication system where M = 2 N , and N is the dimension 
of the signal space. Suppose that the M signal vectors lie on the vertices of a hypercube 
that is centered at the origin. Determine the average probability of a symbol error as a 
function of £ s /Nq where £ s is the energy per symbol, ^ No is the power spectral density of 
the AWGN, and all signal points are equally probable. 

4.33 Consider the signal waveform 

n 

s(t) = - iT c ) 

i=i 

where p{t) is a rectangular pulse of unit amplitude and duration T c . The {q} may be 
viewed as a code vector c = (ci C 2 • • ■ c n ), where the elements c,- = ±1 . Show that the 
filter matched to the waveform s(t ) may be realized as a cascade of a filter matched to 
p{t) followed by a discrete-time filter matched to the vector c. Determine the value of the 
output of the matched filter at the sampling instant t = nT c . 

4.34 A Hadamard matrix is defined as a matrix whose elements are ± 1 and whose row vectors 
are pairwise orthogonal. In the case when n is a power of 2, an n x n Hadamard matrix is 
constructed by means of the recursion given by Equation 3.2-59. 

1. Let Cj denote the i th row of an n x n Hadamard matrix. Show that the waveforms 
constructed as 

n 

Si(t) = Y jCik p (t - kT c ), i = 1 , 2 , . . . , n 

k = 1 

are orthogonal, where p(t) is an arbitrary pulse confined to the time interval 0 < t < T c . 

2. Show that the matched filters (or cross-correlators) for the n waveforms {j, (?)} can be 
realized by a single filter (or correlator) matched to the pulse p(t) followed by a set of 
n cross-correlators using the code words {c, }. 

4.35 The discrete sequence 


= \/~£ck + « a -, k = 1 , 2 ,...,« 

represents the output sequence of samples from a demodulator, where c* = ± 1 are elements 
of one of two possible code words, = [1 1 ••• l]andc 2 = [l 1 ••• 1 —1 ■■■ — 1], 
The code word cj has w elements that are + 1 and n — w elements that are — 1 , where w 
is some positive integer. The noise sequence {rib} is white Gaussian with variance a 2 . 

1. What is the optimum maximum-likelihood detector for the two possible transmitted 
signals? 

2. Determine the probability of error as a function of the parameters (cr 2 , £, w). 

3. What is the value of w that minimizes the error? 
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4.36 In on-off keying of a carrier modulated signal, the two possible signals are 


SoO) = 0, 


•si(0 = 



cos 2jif c t, 


The corresponding received signals are 


0 <t<T b 


r(t) = n(t), 0 < t < T/, 


r(t ) 



cos (In f c t +</>) + n(t). 


0 <t<T b 


where 0 is the carrier phase and n(t) is AWGN. 

1. Sketch a block diagram of the receiver (demodulator and detector) that employs non- 
coherent (envelope) detection. 

2. Determine the PDFs for the two possible decision variables at the detector corresponding 
to the two possible received signals. 

3. Derive the probability of error for the detector. 


4.37 This problem deals with the characteristics of a DPSK signal. 

1. Suppose we wish to transmit the data sequence 

110100010110 

by binary DPSK. Let s(t) = A cos (2nf c t + 9) represent the transmitted signal in any 
signaling interval of duration T. Give the phase of the transmitted signal for the data 
sequence. Begin with 9 = 0 for the phase of the first bit to be transmitted. 

2. If the data sequence is uncorrelated, determine and sketch the power density spectrum 
of the signal transmitted by DPSK. 


4.38 In two-phase DPSK, the received signal in one signaling interval is used as a phase reference 
for the received signal in the following signaling interval. The decision variable is 

D = Re(V m V m *_ 1 ) { 0 

“ 0 ” 

where 

V k = 2£e iiek ~ ,p) + N k 

represents the complex-valued output of the filter matched to the transmitted signal u(t); 
Nk is a complex-valued Gaussian variable having zero mean and statistically independent 
components. 

1. Writing V k = X k + jY k , show that D is equivalent to 


r 1 1 

2 

r 1 

2 

r 1 1 

2 

r 1 1 


+ 

~(Y m + Y m _i) 

~ 

~(X m - X m -i) 

“ 

~(Y m - Y 


2. For mathematical convenience; suppose that 9 k = 0 k -\ ■ Show that the random variables 
U\, U 2 , C/ 3 , and U.\ are statistically independent Gaussian variables, where U\ = 
\(X m +X m _i), U 2 = \(Y m + Y m . 1 ), U 3 = j(X m - X m .i), and U 4 = \(Y m - Y m _,). 

3. Define the random variables W\ = U\ + U\ and W 2 = Uj + Uj. Then 

D = W, - W 2 \ 0 

“ 0 ” 

Determine the probability density functions for W \ and W 2 - 
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4. Determine the probability of error P where 

/•OO 

P h = P(D < 0) = P{Wi - W 2 < 0) = / P(W 2 > w 1 \w l )p(w l )dw 1 

Jo 

4.39 Assuming that it is desired to transmit information at the rate of R bits/s, determine the 
required transmission bandwidth of each of the following six communication systems, and 
arrange them in order of bandwidth efficiency, starting from the most bandwidth-efficient 
and ending at the least bandwidth-efficient. 

1. Orthogonal BFSK 

2. 8PSK 

3. QPSK 

4. 64-QAM 

5. BPSK 

6. Orthogonal 16-FSK 

4.40 In a binary communication system over an additive white Gaussian noise channel, two 
messages represented by antipodal signals s\(t) and s 2 (t) = —Si(t) are transmitted. The 
probabilities of the two messages are p and 1 — p, respectively, where 0 < p < 1/2. The 
energy content of the each message is denoted by £, and the noise power spectral density 



1 . What is the expression for the threshold value r* such that for r > r t h the optimal detector 
makes a decision in favor of si(t)7 What is the expression for the error probability? 

2. Now assume that with probability of 1 /2 the link between the transmitter and the receiver 
is out of service and with a probability of 1 /2 this link remains in service. When the 
link is out of service, the receiver receives only noise. The receiver does not know 
whether the link is in service. What is the structure of the optimal receiver in this case? 
In particular, what is the value of the threshold r , h in this case? What is the value of the 
threshold if p = 1/2? What is the resulting error probability for this case ( p = 1/2)? 

4.41 A digital communication system with two equiprobable messages uses the following 
signals: 

{ 1 0 < t < 1 

2 1 < t < 2 

0 otherwise 

1 . Assuming that the channel is AWGN with noise power spectral density No/2, determine 
the error probability of the optimal receiver and express it in terms of £b/No. By how 
many decibels does this system underperform a binary antipodal signaling system? 

2. Assume that we are using the two-path channel shown in Figure P4.41 


{ 1 0 < t < 1 

—2 1 < t < 2 

0 otherwise 





FIGURE P4.41 
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in which we receive both r\(t) and r 2 (t) at the receiver. Both n\(t) and « 2 (t) are inde- 
pendent white Gaussian processes each with power spectral density No/2. The receiver 
observes both r\{t) and r 2 (t) and makes its decision based on this observation. Deter- 
mine the structure of the optimal receiver and the error probability in this case. 

3. Now assume that r\{t) = As m (t ) + n\(t) and /- 2 (f) = s m {t) + « 2 ((), where m is the 
transmitted message and A is a random variable uniformly distributed over the interval 
[0, 1]. Assuming that the receiver knows the value of A, what is his optimal decision 
rule? What is the error probability in this case? ( Note : This last question, regarding the 
error probability, is asked from you, and you do not know the value of A.) 

4. If the receiver does not know the value of A, what is his optimal decision rule? 

4.42 Two equiprobable messages m 1 and m 2 are to be transmitted through a channel with input 

X and output Y related by Y = pX + N, where A is a zero-mean Gaussian noise with 

variance a 2 and p is a random variable independent of the noise. 

1. Assuming an antipodal signaling scheme ( X = ±A) and a constant p = 1, what is the 
optimal decision rule and the resulting error probability? 

2. With antipodal signaling, if p takes ± 1 with equal probability, what will be the optimal 
decision rule and the resulting error probability? 

3. With antipodal signaling, if p takes 0 and 1 with equal probability, what will be the 
optimal decision rule and the resulting error probability? 

4. Assuming an on-off signaling {X = 0 or A) and p taking ±1 with equal probability, 
what will be the optimal decision rule? 

4.43 A binary communication scheme uses two equiprobable messages m = 1,2 corresponding 

to signals Ji(r) and S 2 (t), where 

si(t) = x(t) 

S2 (t) = x(t — 1 ) 

and x{t) is shown Figure P4.43. 


x(t) 



FIGURE P4.43 

The power spectral density of the noise is No/2. 

1. Design an optimal matched filter receiver for this system. Carefully label the diagram 
and determine all the required parameters. 

2. Determine the error probability for this communication system. 

3. Show that the receiver can be implemented using only one matched filter. 

4. Now assume that si(r) = x(t) and 

f x(t — 1) with probability 0.5 
52 \ x(t) with probability 0.5 
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In other words, in this case for m = 1 the transmitter always sends x(t), but for m = 2 
it is equally likely to send either x(t) or x (t — 1 ). Determine the optimal detection rule 
for this case, and find the corresponding error probability. 


4.44 Let X denote a Rayleigh distributed random variable, i.e.. 


fx(x) 


X _ X 2 

—e 2 ^ x > 0 
0 x < 0 


1 . 

2 . 


3. 

4. 


5. 


Determine E VQ(fiX)\, where fi is a positive constant. ( Hint: Use the definition of the 
Q function and change the order of integration.) 

In a binary antipodal signaling, let the received energy be subject to a Rayleigh dis- 
tributed attenuation; i.e., let the received signal be r{t) = as m (t) + n(t), and therefore, 

Q (? 


P h = 


/ 2a 2 £„ 

N n 


, where a 2 denotes the power attenuation and a has a Rayleigh PDF 


similar to X. Determine the average error probability of this system. 

Repeat part 2 for a binary orthogonal system in which P b = Q ^ • 

Find approximations for the results of parts 2 and 3 with the assumption that a 2 ^ 1 , 

and show that in this case both average error probabilities are proportional to == where 
SNR = 2 ct 2 |u 

Wo 


Now find the average of e ~^ al , where /3 is a positive constant and a is a random variable 
distributed as fx(x). Find an approximation in this case when yScr 2 1. We will later 
see that this corresponds to the error probability of a noncoherent system in fading 
channels. 


4.45 In a binary communication system two equiprobable messages Si = (1, 1) and s 2 = 
(—1, —1) are used. The received signal is r = s + n, where n = (n 1 , ru). It is assumed 
that n 1 and n 2 are independent and each is distributed according to 


fin) = ^ e w 

Determine and plot the decision regions D\ and /L in this communication scheme. 


4.46 Two equiprobable messages are transmitted via an additive white Gaussian noise channel 
with noise power spectral density of ^ = 1 . The messages are transmitted by the following 
two signals 


ri o<r<i 

= 1 n 

^ 0 otherwise 

and S 2 (t) = Si(t — 1). It is intended to implement the receiver by using a correlation-type 
structure, but due to imperfections in the design of the correlators, the structure shown 
in Figure P4.46 has been implemented. The imperfection appears in the integrator in the 
upper branch where instead of L we have J 0 ' . The decision device, therefore, observes r\ 
and r 2 and based on this observation has to decide which message was transmitted. What 
decision rule should be adopted by the decision device for an optimal decision? 
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FIGURE P4.46 


4.47 A baseband digital communication system employs the signals shown in Figure P4. 47(a) 
for transmission of two equiprobable messages. It is assumed the communication problem 
studied here is a “one-shot” communication problem; i.e., the above messages are transmit- 
ted just once, and no transmission takes place afterward. The channel has no attenuation, 
and the noise is AWG with power spectral density 

1. Find an appropriate orthonormal basis for the representation of the signals. 

2. In a block diagram, give the precise specifications of the optimal receiver using matched 
filters. Label the block diagram carefully. 

3. Find the error probability of the optimal receiver. 

4. Show that the optimal receiver can be implemented by using just one filter (see block 
diagram shown in Figure P4.47(b)). What are the characteristics of the matched filter 
and the sampler and decision device? 
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5. Now assume the channel is not ideal, but has an impulse response of c(t) = S(t) + 
|i5(f — T). Using the same matched filter you used in part 4, derive the optimal decision 
rule. 

6. Assuming that the channel impulse response is c(t ) = S(t ) + aS(t — j), where a is 
a random variable uniformly distributed on [0, 1], and using the same matched filter, 
derive the optimal decision rule. 

4.48 A binary communication system uses antipodal signals si(t) = s{t) and S 2 (t) = —s(t) 
for transmission of two equiprobable messages m\ and mj- The block diagram of the 
communication system is given in Figure P4.48. 
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Message Sj(t) is transmitted through two paths to a single receiver, and the receiver 
makes its decision based on the observation of both received signals r\(t) and r 2 {t). How- 
ever, the upper channel is connected by a switch S which can either be closed or open. 
When the switch is open, r\{t) = n\(t ); i.e., the first channel provides only noise to the 
receiver. The switch is open or closed randomly with equal probability, but during the 
transmission it will not change position. Throughout this problem, it is assumed that the 
two noise processes are stationary, zero-mean, independent, white and Gaussian processes 
each with a power spectral density of Mo/2. 

1 . If the receiver does not know the position of the switch, determine the optimal decision 
rule. 

2. Now assume that the receiver knows the position of the switch (the switch is still equally 
likely to be open or closed). What is the optimal decision rule in this case, and what is 
the resulting error probability? 

3. In this part assume that both the transmitter and the receiver know the position of the 
switch (which is still equally likely to be open or closed). Assume that in this case the 
transmitter has a certain level of energy that it can transmit. To be more specific, assume 
that in the upper arm as,(t) and in the lower arm /3 s,(t) is transmitted, where a, /3 > 0 
and a 2 + /3 2 = 2. What is the best power allocation strategy by the transmitter (i.e., 
what is the best choice for a and j3), what is the optimal decision rule at the receiver, 
and what is the resulting error probability? 

4.49 The block diagram of a two-path communication system is shown in Figure P4.49. In 
the first path noise n\{t) is added to the transmitted signal. In the second path the signal 
is subject to a random amplification A and additive noise n 2 (t). The random variable A 
takes values ± 1 with equal probability. The transmitted signal is binary antipodal, and 
the two messages are equiprobable. Both >i\{t) and ii 2 (t) are zero-mean, white, Gaussian 
noise processes with power spectral densities M)/2 and N 2 / 2 , respectively. The receiver 
observes both r\(t) and r 2 (t). 


Chapter Four: Optimum Receivers for AW GN Channels 


283 




•Q *- 'i(» 


A 


n 2 {t) 


■0 


■(+) *- r 2 W 


FIGURE P4.49 

1. Assuming that the two noise processes are independent , determine the structure of the 
optimum receiver and find an expression for the error probability. 

2. NowassumeAi = At = 2 and E\n\H 2 \ = 1/2, wherenj and «2 denote the projections 
of »i(f) and « 2(0 on the unit signal in the direction of s(t) (obviously the two noise 
processes are dependent). Determine the structure of the optimum receiver in this case. 

3. What is the structure of the optimal receiver if the noise processes are independent 
and the receiver has access to r(t) = r\(t) + r 2 (t) instead of observing r\(t) and r 2 (t) 
separately? 

4. Determine the optimal decision rule if the two noise processes are independent and A 
can take 0 and 1 with equal probability [receiver has access to both n(f) and r 2 (t)]. 

5. What is the optimal detection rule in part 4 if we assume that the upper link is similar 
to the lower link but with A substituted with random variable B where B = 1 — A (the 
lower link remains unchanged)? 

4.50 A fading channel can be represented by the vector channel model r = as„, + n, where a 

is a random variable denoting the fading, whose density function is given by the Rayleigh 

distribution 


1 . Assuming that equiprobable signals, binary antipodal signaling, and coherent detection 
are employed, what is the structure of the optimal receiver? 

2. Show that the bit error probability in this case can be written as 



a > 0 
a < 0 



and for large SNR values we have 


1 


4£ b /N 0 


3. Assuming an error probability of 10 -5 is desirable, determine the required SNR per bit 
(in dB) if (i) the channel is nonfading and (ii) the channel is a fading channel. How much 
more power is required by the fading channel to achieve the same bit error probability? 
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4. Show that if binary orthogonal signaling and noncoherent detection are employed, we 
have 


1 

P b = 

2 + £ b /N 0 

4.51 A multiple access channel (MAC) is a channel with two transmitters and one receiver. 
The two transmitters transmit two messages, and the receiver is interested in correct de- 
tection of both messages. A block diagram of such system in the AWGN case is shown in 
Figure P4.51. 



FIGURE P4.51 


The messages are independent binary equiprobable random variables, and both modu- 
lators use binary antipodal signaling schemes. We have si(r) = ±gi(r) and ,? 2 (t) = ±g2(0 
depending on the values of m i and m 2 , and gi(t) and gi(t) are two unit energy pulses each 
with duration T (gi(f) and g 2 (t) are not necessarily orthogonal). The received signal is 
r(t) = s\(t) + s 2 (t) + n(t), where n{t) is a white Gaussian process with a power spectral 
density of Nq/2. 

1. What is the structure of the receiver that minimizes P(m\ / mi) and P (;«2 / m 2 )? 

2. What is the structure of the receiver that minimizes P{(m \ , m 2 ) ^ (mi, m 2 ))? 

3. Between receivers designed in parts 1 and 2, which would you label as the real optimal 
receiver? Which has a simpler structure? 

4. What are the minimum error probabilities p 2 and p 2 for the receiver in part 1 and p 12 
for the receiver in part 2? 


4.52 The constellation for an MPSK modulation system is shown in Figure P4.52. Only point 
s 1 and its decision region are shown here. The shaded area (extended to infinity) shows 
the error region when S\ is transmitted. 

1. Express R in terms of £, 6, and M. 

2. Using the value of R and integrating over the gray area, show that the error probability 
for this system can be written as 



3. Find the error probability for M = 2, and by equating it with the error probability of 
BPSK, conclude that Q(x) can be expressed as 


Q(x) = 



de 
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FIGURE P4.52 

4.53 A communication system employs M signals {.y m (f )}^ =| for transmission of M equiproba- 
ble messages. The receiver has two antennas and receives two signals r\(t) = s m (t)+ni(t) 
and r 2 (t) = s m (t) + n 2 (t) by these antennas. Both n\(t) and n 2 (t) are white Gaussian 
noises with power spectral densities Nq\/2 and Nq 2 /2, respectively. The receiver makes 
its optimal detection based on the observation of both n(r) and r 2 (t). It is further assumed 
that the two noise processes are independent. 



FIGURE P4.53 

1 . Determine the optimal decision rule for this receiver. 

2. Assuming Noi = N 02 = No, determine the optimal receiver structure. 

3. Show that under the assumption of part 2, the receiver needs to know only r\(t) + r 2 (t). 

4. Now assume the system is binary and employs on-off signaling (i.e., .si(f) = s(t ) and 
s 2 (t) = 0), and show that the optimal decision rule consists of comparing r \ + ar 2 with 
a threshold. Determine a and the threshold (in this part you are assuming noise powers 
are different). 
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5. Show that in part 4, if noise powers are equal, then a = 1, and determine the error 
probability in this case. How does this system compare with a system that has only one 
antenna, i.e., receives only 

4.54 A communication system employs binary antipodal signals with 

f 1 0 < t < 1 

H(0 = { 

( 0 otherwise 


and s 2 (t) = — tfi(f)- The received signal consists of a direct component, a scattered com- 
ponent, and the additive white Gaussian noise. The scattered component is a delayed 
version of the basic signal times a random amplification A. In other words, we have 
r(t) = s(t) + As(t — 1) + n(t), where s(t) is the transmitted message, A is an exponential 
random variable, and n(t) is a white Gaussian noise with a power spectral density of Nq/2. 
It is assumed that the time delay of the multipath component is constant (equal to 1) and 
A and n(t) are independent. The two messages are equiprobable and 

f e~ a a > 0 

/a(o) = < 

\ 0 otherwise 

1 . What is the optimal decision rule for this problem? Simplify the resulting rule as much 
as you can. 

2. How does the error probability of this system compare with the error probability of a 
system which does not involve multipath? Which one has a better performance? 

4.55 A binary communication system uses equiprobable signals s\(t) and s 2 (t) 

Ji(t) = \flEb 0i(f)cos(2jr/ c ?) 
s 2 (t) = v / 2^02(Ocos(27r/ £ .r) 

for transmission of two equiprobable messages. It is assumed that (p\{t) and 02(0 are 
orthonormal. The channel is AWGN with noise power spectral density of Nq/2. 

1. Determine the optimal error probability for this system, using a coherent detector. 

2. Assuming that the demodulator has a phase ambiguity between 0 and 0 (0 < 6 < jr) 
in carrier recovery, and employs the same detector as in part 1, what is the resulting 
worst-case error probability? 

3. What is the answer to part 2 in the special case where 9 = jr/2? 


4.56 In this problem we show that the volume of an n -dimensional sphere with radius R, defined 
by the set of all x g R" such that ||x|| < R, is given by V n (R) = B n R n , where 

R - ** 

n r (! + !) 

1. Using change of variables, show that 


V n (R) = 


dx 1 dx 2 . . . dx„ = B n R" 


x\+xl+~+xl<R 2 


where B„ is the volume on an ^-dimensional sphere of radius 1, i.e., B n = V(\ ). 
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2. Consider n iid Gaussian random variables Y,,i = 1,2, ... ,n, each distributed accord- 
ing to J\f(0 , 1). Show that the probability that Y = (Fi, Yi, . . . , Y n ) would lie in the 
area between two spheres of radii R and R — e, where e > 0 is very small such that 
^ can be approximated as 

P[R-e < \\Y\\ < R] = p(y)[V„(R)- V n (R - e)] 

^ enR n - l B„ 

(2jr) n / 2 

3. Note that p{y) is a function of ||y||. From this show that we can also approximate 
P[R-e < |Y|| < R] as 


P[R-€<\\Y\\<R]^p m (R)e 


where p m (') denoted the PDF of || Y||. 

4. From parts 2 and 3 conclude that 

nr"~ ] B„ 

P ™ {r)= lW& e ~* 

5. Using the fact that p m (r) is a PDF and therefore its integral over the positive real line 
is equal to 1 , conclude that 


n B n 


r" 1 e 2 dr = 1 


(2 jt )"/ 2 J Q 

6. Using the definition of the gamma function given by Equation 2.3-22 as 


show that 


and conclude that 


rU) = / t x l e ' dt, x > 0 




B„ = 


r (f + i) 


4.57 Let Z" + denote the n -dimensional integer lattice shifted by 1/2, and let 

TZ be an n-dimensional hypercube centered at the origin with side length L which defines 
the boundary of this lattice. We further assume that n is even and L = 2 l is a power of 2; 
the number of bits per two dimensions is denoted by fi, and we consider a constellation C 
based on the intersection of the shifted lattice + and the boundary region 

7 Z defined as an n -dimensional hypercube centered at the origin with side length L. 

1. Show that fi = 21 + 2. 

2. Show that for this constellation the figure of merit is approximated by 

CFM(C) « A 

Note that this is equal to the CFM for a square QAM constellation. 

3. Show the shaping gain of TZ is given by y s (TZ) = 1. 
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4.58 Recall that MSK can be represented as a four-phase offset PSK modulation having the 
lowpass equivalent form 


and {I k } and {4} are sequences of information symbols (±1). 

1. Sketch the block diagram of an MSK demodulator for offset QPSK. 

2. Evaluate the performance of the four-phase demodulator for AWGN if no account is 
taken of the memory in the modulation. 

3. Compare the performance obtained in part 2 with that for Viterbi decoding of the MSK 
signal. 

4. The MSK signal is also equivalent to binary FSK. Determine the performance of non- 
coherent detection of the MSK signal. Compare your result with parts 2 and 3. 

4.59 Consider a transmission line channel that employs n — 1 regenerative repeaters plus the 
terminal receiver in the transmission of binary information. Assume that the probability of 
error at the detector of each receiver is p and that errors among repeaters are statistically 
independent. 

1. Show that the binary error probability at the terminal receiver is 


2. If p = 10 6 and n = 100, determine an approximate value of P„. 

4.60 A digital communication system consists of a transmission line with 100 digital (regener- 
ative) repeaters. Binary antipodal signals are used for transmitting the information. If the 
overall end-to-end error probability is 10 -6 , determine the probability of error for each 
repeater and the required £ b /No to achieve this performance in AWGN. 

4.61 A radio transmitter has a power output of P T = 1 W at a frequency of 1 GHz. The 
transmitting and receiving antennas are parabolic dishes with diameter D = 3m. 

1. Determine the antenna gains. 

2. Determine the EIRP for the transmitter. 

3. The distance (free space) between the transmitting and receiving antennas is 20 km. 
Determine the signal power at the output of the receiving antenna in decibels. 

4.62 A radio communication system transmits at a power level of 0.1 W at 1 GHz. The trans- 
mitting and receiving antennas are parabolic, each having a diameter of 1 m. The receiver 
is located 30 km from the transmitter. 

1. Determine the gains of the transmitting and receiving antennas. 

2. Determine the EIRP of the transmitted signal. 

3. Determine the signal power from the receiving antenna. 

4.63 A satellite in synchronous orbit is used to communicate with an earth station at a distance 
of 40,000 km. The satellite has an antenna with a gain of 15 dB and a transmitter power 


v(t) = ^[4«(t - 2 kT b ) + jJ k u(t - 2kT b - T h )\ 


k 


where 



2 


[1 - (1 - 2 P y ] 
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of 3 W. The earth station uses a 10-m parabolic antenna with an efficiency of 0.6. The 
frequency band is at / = 1 GHz. Determine the received power level at the output of the 
receiver antenna. 

4.64 A spacecraft located 100,000 km from the earth is sending data at a rate of R bits/s. The 
frequency band is centered at 2 GHz, and the transmitted power is 10 W. The earth station 
uses a parabolic antenna, 50 m in diameter, and the spacecraft has an antenna with a gain 
of 10 dB. The noise temperature of the receiver front end is Tq = 300 K. 

1. Determine the received power level. 

2. If the desired £b/No = 10 dB, determine the maximum bit rate that the spacecraft can 
transmit. 

4.65 A satellite in geosynchronous orbit is used as a regenerative repeater in a digital commu- 
nication system. Consider the satellite-to-earth link in which the satellite antenna has a 
gain of 6 dB and the earth station antenna has a gain of 50 dB. The downlink is operated 
at a center frequency of 4 GHz, and the signal bandwidth is 1 MHz. If the required £b /N q 
for reliable communication is 15 dB, determine the transmitted power for the satellite 
downlink. Assume that No = 4.1 x 10 -21 W/Hz. 



Carrier and Symbol Synchronization 


W e have observed that in a digital communication system, the output of the demod- 
ulator must be sampled periodically, once per symbol interval, in order to recover the 
transmitted information. Since the propagation delay from the transmitter to the re- 
ceiver is generally unknown at the receiver, symbol timing must be derived from the 
received signal in order to synchronously sample the output of the demodulator. 

The propagation delay in the transmitted signal also results in a carrier offset, which 
must be estimated at the receiver if the detector is phase-coherent. In this chapter, we 
consider methods for deriving carrier and symbol synchronization at the receiver. 


■ 5.1 

SIGNAL PARAMETER ESTIMATION 

Let us begin by developing a mathematical model for the signal at the input to the re- 
ceiver. We assume that the channel delays the signals transmitted through it and corrupts 
them by the addition of Gaussian noise. Hence, the received signal may be expressed as 

r(r) = s(t — r) + n(t ) 


where 

s(t) = Re[si(t)e j27Tf < :t ] (5.1-1) 

and where r is the propagation delay and s/(t) is the equivalent low-pass signal. 

The received signal may be expressed as 

r(t) = Re{ [si(t — T)e j4> + z(t)\ e j2nfct ) (5. 1-2) 

where the carrier phase (p, due to the propagation delay r, is cp = —2jtf c T. Now, from 
this formulation, it may appear that there is only one signal parameter to be estimated, 
namely, the propagation delay, since one can determine </> from knowledge of f c and r. 
However, this is not the case. First of all, the oscillator that generates the carrier signal 
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for demodulation at the receiver is generally not synchronous in phase with that at the 
transmitter. Furthermore, the two oscillators may be drifting slowly with time, perhaps 
in different directions. Consequently, the received carrier phase is not only dependent 
on the time delay r. Furthermore, the precision to which one must synchronize in time 
for the purpose of demodulating the received signal depends on the symbol interval 
T. Usually, the estimation error in estimating r must be a relatively small fraction of 
T. For example, ±1 percent of T is adequate for practical applications. However, this 
level of precision is generally inadequate for estimating the carrier phase, even if 0 
depends only on r. This is due to the fact that f c is generally large, and, hence, a small 
estimation error in r causes a large phase error. 

In effect, we must estimate both parameters r and 0 in order to demodulate and 
coherently detect the received signal. Hence, we may express the received signal as 

r(t) = s{t\ 0, r) + n{t) (5.1-3) 


where 0 and r represent the signal parameters to be estimated. To simplify the notation, 
we let 9 denote the parameter vector {0, r}, so that s(t: 0, r) is simply denoted by 
s(t; 0). 

There are basically two criteria that are widely applied to signal parameter esti- 
mation: the maximum-likelihood (ML) criterion and the maximum a posteriori proba- 
bility (MAP) criterion. In the MAP criterion, the signal parameter vector 9 is modeled 
as random and characterized by an a priori probability density function p(0). In the 
maximum-likelihood criterion, the signal parameter vector 9 is treated as deterministic 
but unknown. 

By performing an orthonormal expansion of r(t) using N orthonormal functions 
(0„(f)}, we may represent r(t) by the vector of coefficients (r\ ri - ■ ■ r N ) = r. The joint 
PDF of the random variables {r\ n ■ ■ ■ r N ) in the expansion can be expressed as p(r\6). 
Then, the ML estimate of 9 is the value that maximizes p{r\9). On the other hand, 
the MAP estimate is the value of 9 that maximizes the a posteriori probability density 
function 


p{0\r) 


P(r\6)p(0 ) 

P(r ) 


(5.1-4) 


We note that if there is no prior knowledge of the parameter vector 9, we may 
assume that p(9) is uniform (constant) over the range of values of the parameters. In 
such a case, the value of 6 that maximizes p(r\9) also maximizes p(9\r). Therefore, 
the MAP and ML estimates are identical. 

In our tre atment of parameter estimation given below, we view the parameters 0 and 
r as unknown, but deterministic. Hence, we adopt the ML criterion for estimating them. 

In the ML estimation of signal parameters, we require that the receiver extract 
the estimate by observing the received signal over a time interval Tq > T, which is 
called the observation interval. Estimates obtained from a single observation interval are 
sometimes called one-shot estimates. In practice, however, the estimation is performed 
on a continuous basis by using tracking loops (either analog or digital) that continuously 
update the estimates. Nevertheless, one-shot estimates yield insight for tracking loop 
implementation. In addition, they prove useful in the analysis of the performance of ML 
estimation, and their performance can be related to that obtained with a tracking loop. 
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5.1-1 The Likelihood Function 


Although it is possible to derive the parameter estimates based on the joint PDF of the 
random variables (j\ r 2 • ■ • r N ) obtained from the expansion of r(t), it is convenient to 
deal directly with the signal waveforms when estimating their parameters. Hence, we 
shall develop a continuous-time equivalent of the maximization of p(r \0). 

Since the additive noise n(t) is white and zero-mean Gaussian, the joint PDF p{r\0) 
may be expressed as 


where 


p(r\0) = 


= ( 


Wine 


N / ^ 
ex P i - 2^ 


n = 1 


[r n - s„(0)] 2 
2a 1 


r n = / r{t)(p„{t)dt 
Jt 0 

s n (0) = I s(t; 0)<p n (t)dt 
Jt 0 


(5.1-5) 


(5.1-6) 


where To represents the integration interval in the expansion of r(t) and s(t: 6). 

We note that the argument in the exponent may be expressed in terms of the signal 
waveforms r(t) and ,s(t: 0), by substituting from Equation 5.1-6 into Equation 5.1-5. 
That is, 


lim ^ f>„ - Sn(0)l 2 = ~ r I [ r(t ) - s(f, 0)] 2 dt (5.1-7) 

Ar^oo 2 a 1 ^ No Jt q 

where the proof is left as an exercise for the reader (see Problem 5.1). Now, the max- 
imization of p(r\0) with respect to the signal parameters 0 is equivalent to the maxi- 
mization of the likelihood function. 

A(0) = exp | ~Y q J t W) - ■< t ’ °tf dt } ( 5 - 1 “ 8 ) 

Below, we shall consider signal parameter estimation from the viewpoint of maximizing 
A(0). 


5.1-2 Carrier Recovery and Symbol Synchronization 
in Signal Demodulation 

Symbol synchronization is required in every digital communication system which trans- 
mits information synchronously. Carrier recovery is required if the signal is detected 
coherently. 

Figure 5.1-1 illustrates the block diagram of a binary PSK (or binary PAM) signal 
demodulator and detector. As shown, the carrier phase estimate (j> is used in generating 
the reference signal g(t ) cos(2tt f c t + 0) for the correlator. The symbol synchronizer 
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Output 



FIGURE 5.1-1 

Block diagram of a binary PSK receiver. 

controls the sampler and the output of the signal pulse generator. If the signal pulse is 
rectangular, then the signal generator can be eliminated. 

The block diagram of an M - ary PSK demodulator is shown in Figure 5.1-2. In this 
case, two correlators (or matched biters) are required to correlate the received signal 
with the two quadrature carrier signals g(t) cos(2jrf c t + 0) and g(t) si n(2jr/) / + 0), 
where 0 is the carrier phase estimate. The detector is now a phase detector, which 
compares the received signal phases with the possible transmitted signal phases. 

The block diagram of a PAM signal demodulator is shown in Figure 5. 1-3. In this 
case, a single correlator is required, and the detector is an amplitude detector, which 



FIGURE 5.1-2 

Block diagram of an M - ary PSK receiver. 
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FIGURE 5.1-3 

Block diagram of an M- ary PAM receiver. 


compares the received signal amplitude with the possible transmitted signal amplitudes. 
Note that we have included an automatic gain control (AGC) at the front end of the 
demodulator to eliminate channel gain variations, which would affect the amplitude 
detector. The AGC has a relatively long time constant, so that it does not respond to the 
signal amplitude variations that occur on a symbol-by-symbol basis. Instead, the AGC 
maintains a fixed average (signal plus noise) power at its output. 

Finally, we illustrate the block diagram of a QAM demodulator in Figure 5.1-4. 
As in the case of PAM, an AGC is required to maintain a constant average power signal 



FIGURE 5.1-4 

Block diagram of a QAM receiver. 
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at the input to the demodulator. We observe that the demodulator is similar to a PSK 
demodulator, in that both generate in-phase and quadrature signal samples ( X , Y ) for 
the detector. In the case of QAM, the detector computes the Euclidean distance between 
the received noise-corrupted signal point and the M possible transmitted points, and 
selects the signal closest to the received point. 


■ 5.2 

CARRIER PHASE ESTIMATION 

There are two basic approaches for dealing with carrier synchronization at the receiver. 
One is to multiplex, usually in frequency, a special signal, called a pilot signal, that 
allows the receiver to extract and, thus, to synchronize its local oscillator to the carrier 
frequency and phase of the received signal. When an unmodulated carrier component 
is transmitted along with the information-bearing signal, the receiver employs a phase- 
locked loop (PLL) to acquire and track the carrier component. The PLL is designed 
to have a narrow bandwidth so that it is not significantly affected by the presence of 
frequency components from the information-bearing signal. 

The second approach, which appears to be more prevalent in practice, is to derive 
the carrier phase estimate directly from the modulated signal. This approach has the 
distinct advantage that the total transmitter power is allocated to the transmission of 
the information-bearing signal. In our treatment of carrier recovery, we confine our 
attention to the second approach; hence, we assume that the signal is transmitted via 
suppressed carrier. 

In order to emphasize the importance of extracting an accurate phase estimate, 
let us consider the effect of a carrier phase error on the demodulation of a double- 
sideband, suppressed carrier (DSB/SC) signal. To be specific, suppose we have an 
amplitude-modulated signal of the form 

s(t) = A(t) cos( 27 r f c t + 0) (5.2-1) 

If we demodulate the signal by multiplying j(f) with the carrier reference 

c(t) = cos(2itf c t + 0) (5.2-2) 


we obtain 

c(t)s(t ) = iA(r)cos(0 — 0) + \ A(t) cos(An f c t + </> + 0) 

The double-frequency component may be removed by passing the product signal 
c(t)s(t) through a low-pass filter. This filtering yields the information-bearing signal 

y(t) = \ A(t) cos(0 — 0) (5.2-3) 

Note that the effect of the phase error 0 — 0 is to reduce the signal level in voltage 
by a factor cos(0 — 0) and in power by a factor cos 2 (0 — 0). Hence, a phase error of 
10° results in a signal power loss of 0.13 dB, and a phase error of 30° results in a signal 
power loss of 1 .25 dB in an amplitude-modulated signal. 
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The effect of carrier phase errors in QAM and multiphase PSK is much more 
severe. The QAM and M-PSK signals may be represented as 

s(t) = A{t) cos(2nf c t + 0) — B(t ) sin(27 xf c t + 0) (5.2-4) 

This signal is demodulated by the two quadrature carriers 

Ci(t) = co&ilnfat + 0) (5.2-5) 

c q (t) = - sin(2jr/ c r + 0) 

Multiplication of s(t) with c t (t) followed by low-pass filtering yields the in-phase 
component 

yj{t) = \ A{t) cos(0 — 0) — \B(t) sin(0 — 0) (5.2-6) 

Similarly, multiplication of s(t) by c q it) followed by low-pass filtering yields the 
quadrature component 

y Q (t) = ifi(f)cos(0 - 0) + jA(r) sin(0 - 0) (5.2-7) 

The expressions 5.2-6 and 5.2-7 clearly indicate that the phase error in the demodulation 
of QAM and M-PSK signals has a much more severe effect than in the demodulation 
of a PAM signal. Not only is there a reduction in the power of the desired signal 
component by a factor cos 2 (0 — 0), but there is also crosstalk interference from the 
in-phase and quadrature components. Since the average power levels of A(t) and Bit) 
are similar, a small phase error causes a large degradation in performance. Hence, the 
phase accuracy requirements for QAM and multiphase coherent PSK are much higher 
than for DSB/SC PAM. 


5.2-1 Maximum-Likelihood Carrier Phase Estimation 

First, we derive the maximum-likelihood carrier phase estimate. For simplicity, we 
assume that the delay r is known and, in particular, we set r = 0. The function to be 
maximized is the likelihood function given in Equation 5.1-8. With 0 substituted for 0, 
this function becomes 

A(0) = exp | - j [r(Q - s(t; 0)] 2 dt 

= exp { — tt - f r 2 (t)dt+^~ f r{t)s(t\ 0 ) dt - f s 2 {t\(j))dt 
\ dyQ JTq J To 1*0 J Tq 

(5.2-8) 

Note that the first term of the exponential factor does not involve the signal parameter 0. 
The third term, which contains the integral of s 2 (r; 0), is a constant equal to the signal 
energy over the observation interval To for any value of 0. Only the second term, which 
involves the cross correlation of the received signal r(t) with the signal sit: 0), depends 
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on the choice of 0. Therefore, the likelihood function A(0) may be expressed as 


A(0) = C exp 


' 2 

.No 


r(t)s(t\ 0) dt 


I T 0 


(5.2-9) 


where C is a constant independent of 0. 

The ML estimate 0 ML is the value of 0 that maximizes A (0) in Equation 5.2-9. 
Equivalently, the value 0 ml also maximizes the logarithm of A (0), i.e., the log- 
likelihood function 

a l(0)= f r(t)s(f,<P)dt (5.2-10) 

No Jt 0 


Note that in defining A L (0) we have ignored the constant term In C. 


example 5.2-1. As an example of the optimization to determine the carrier phase, 
let us consider the transmission of the unmodulated carrier A cos 2nf c t. The received 
signal is 


r(t) — A cos(27r f c t + 0) + n(t) 

where 0 is the unknown phase. We seek the value 0, say 0 ml, that maximizes 

Al(0)=— / r(t) cos(2 jtf c t + 4>)dt 
No JTo 


A necessary condition for a maximum is that 

r/A i (0) 


d(p 


= 0 


This condition yields 


r(t) sin(27r/ c f + 0 M l) dt = 0 


(5.2-11) 


or, equivalently. 


0ml = — tan 


r(t) sin2jr f c t dt / r(t) cos 2jtf c t dt 


I To 


I To 


(5.2-12) 


We observe that the optimality condition given by Equation 5.2-1 1 implies the use of a 
loop to extract the estimate as illustrated in Figure 5.2-1. The loop filter is an integrator 
whose bandwidth is proportional to the reciprocal of the integration interval T 0 . On the 
other hand. Equation 5.2-12 implies an implementation that uses quadrature carriers 
to cross-correlate with r(t). Then 0 ml is the inverse tangent of the ratio of these two 
correlator outputs, as shown in Figure 5.2-2. Note that this estimation scheme yields 
0ml explicitly. 


dD 


& 


J T„()dt 




vco 




sin (2 nf c t + 0 ML ) 


FIGURE 5.2-1 

A PLL for obtaining the ML estimate of the phase of an 
unmodulated carrier. 
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FIGURE 5.2-2 

A (one-shot) ML estimate of the phase of an 
unmodulated carrier. 



This example clearly demonstrates that the PLL provides the ML estimate of the 
phase of an unmodulated carrier. 


5.2-2 The Phase-Locked Loop 

The PLL basically consists of a multiplier, a loop filter, and a voltage-controlled os- 
cillator (VCO), as shown in Figure 5.2-3. If we assume that the input to the PLL is 
the sinusoid cos(2tt f c t + 0) and the output of the VCO is sin(27r/ c f + 0), where 0 
represents the estimate of 0, the product of these two signals is 

e(t ) = cos(27T f c t + 0) sin(2n f c t + 0) (5.2-13) 

= i sin(0 — 0) + | sin(4;r/ c .f + 0 + 0) 

The loop filter is a low-pass filter that responds only to the low-frequency compo- 
nent \ sin(0 — 0) and removes the component at 2 f c . This filter is usually selected to 
have the relatively simple transfer function 


G(s) = 


1 + x 2 s 
1 + T\S 


(5.2-14) 


where x\ and x 2 are design parameters (ti +> x 2 ) that control the bandwidth of the loop. 
A higher-order filter that contains additional poles may be used if necessary to obtain 
a better loop response. 

The output of the loop filter provides the control voltage v(t) for the VCO. The 
VCO is basically a sinusoidal signal generator with an instantaneous phase given by 


2itf c t + 0(f) = 2jtf c t + K 


v(x)dx 


(5.2-15) 


Input 

signal 


< 5 >- 


Output 

signal 



Loop 

filter 






VCO 





FIGURE 5.2-3 

Basic elements of a phase-locked loop (PLL). 
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FIGURE 5.2-4 

Model of phase-locked loop. 
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where K is a gain constant in rad/V. Hence, 

<}>(t) = K [ v(r)dr (5.2-16) 

J —OO 

By neglecting the double-frequency term resulting from the multiplication of the input 
signal with the output of the VCO, we may reduce the PLL into the equivalent closed- 
loop system model shown in Figure 5.2—4. The sine function of the phase difference 
(j) — cj) makes this system non-linear, and, as a consequence, the analysis of its perfor- 
mance in the presence of noise is somewhat involved, but, nevertheless, it is mathemat- 
ically tractable for some simple loop biters. 

In normal operation when the loop is tracking the phase of the incoming carrier, 
the phase error 0 — 0 is small and, hence, 

sin(0 - 0) 0 - 0 (5.2-17) 


With this approximation, the PLL becomes linear and is characterized by the closed- 
loop transfer function 


H(s) = 


KG(s)/s 
1 + KG(s)/s 


(5.2-18) 


where the factor of ^ has been absorbed into the gain parameter K. By substituting 
from Equation 5.2-14 for G(s) into Equation 5.2-18, we obtain 


H(s) = 


1 + r 2 s 

1 + (r 2 + 1 /K)s + (ji/K)s 2 


(5.2-19) 


Hence, the closed-loop system for the linearized PLL is second-order when G(s ) is 
given by Equation 5.2-14. The parameter To controls the position of the zero, while K 
and T\ are used to control the position of the closed-loop system poles. It is customary 
to express the denominator of H (.v ) in the standard form 

D(s ) = s 2 + 2t;co n s + co 2 n (5.2-20) 


where f is called the loop damping factor and co„ is the natural frequency of the loop. In 
terms of the loop parameters, u>„ = f K / r i , and f = co n ( to + l/K)/2, the closed-loop 
transfer function becomes 


(2 Zcon ~ orj K)s + a>l 
s 2 + 2 l;co n s + col 


(5.2-21) 
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FIGURE 5.2-5 

Frequency response of a second-order loop. [ From Phaselock Techniques, 2nd edition, by F. M. 
Gardner, © 1979 by John Wiley and Sons, Inc. Reprinted with permission of the publisher .] 


The (one-sided) noise-equivalent bandwidth (see Problem 2.52) of the loop is 

t 2 2 (1/t 2 2 + K/ n) 

4(t 2 + 1 /K) (5.2-22) 

1 + (r 2 cu„) 2 

8fM , 

The magnitude response 20 log \H(co)\ as a function of the normalized frequency 
a>/co„ is illustrated in Figure 5.2-5, with the damping factor ( as a parameter and 
x\ 1. Note that f = 1 results in a critically damped loop response, f < I produces 
an underdamped response, and f > 1 yields an overdamped response. 

In practice, the selection of the bandwidth of the PLL involves a tradeoff between 
speed of response and noise in the phase estimate, which is the topic considered below. 
On the one hand, it is desirable to select the bandwidth of the loop to be sufficiently 
wide to track any time variations in the phase of the received carrier. On the other hand, 
a wideband PLL allows more noise to pass into the loop, which corrupts the phase 
estimate. Below, we assess the effects of noise in the quality of the phase estimate. 



5.2-3 Effect of Additive Noise on the Phase Estimate 

In order to evaluate the effects of noise on the estimate of the carrier phase, let us assume 
that the noise at the input to the PLL is narrowband. For this analysis, we assume that 
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the PLL is tracking a sinusoidal signal of the form 

s(t) = A c cos[2 nf c t + 0(f)] (5.2-23) 

that is corrupted by the additive narrowband noise 

n(t) = x(t) cos 2nf c t — y(t ) sin 2nf c t (5.2-24) 

The in-phase and quadrature components of the noise are assumed to be statistically 
independent, stationary Gaussian noise processes with (two-sided) power spectral den- 
sity \Nq W/Hz. By using simple trigonometric identities, the noise term in Equation 
5.2-24 can be expressed as 

n(t) = nj(t) cos[2jt f c t + 0(f)] — n q (t ) sin[27r/ c f + 0(0] (5.2-25) 

where 


n,(f) = x(f) cos 0(f) + y(f) sin 0(f) 
n q (t) = ~x(t) sin 0(f) + y(t) cos 0(f) 

We note that 


(5.2-26) 


nt(t) + jn q (t) = [x(0 + jy(t)\e ] ^ f) 

so that the quadrature components n,(0 an d n q (t) have exactly the same statistical 
characteristics as x(t) and y(t). 

If 5(0 + n(t) is multiplied by the output of the VCO and the double-frequency 
terms are neglected, the input to the loop filter is the noise-corrupted signal 

e(t) = A c sin A0 + n,(t) sin A 0 — n„(t) cos A0 

(5.2-27) 

= A c sin A0 + /zi(0 

where, by definition A 0 = 0 — 0 is the phase error. Thus, we have the equivalent 
model for the PLL with additive noise as shown in Ligure 5.2-6. 

When the power P, = \A 1 C of the incoming signal is much larger than the noise 
power, we may linearize the PLL and, thus, easily determine the effect of the additive 
noise on the quality of the estimate 0. Under these conditions, the model for the 


«i(0 



vco 


FIGURE 5.2-6 

Equivalent PLL model with additive noise. 
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FIGURE 5.2-7 

Linearized PLL model with additive noise. 


«i(0 



vco 


linearized PLL with additive noise is illustrated in Figure 5.2-7. Note that the gain 
parameter A c may be normalized to unity, provided that the noise terms are scaled by 
1 /A c , i.e., the noise terms become 

n;(t ) n a (t) 

n 2 (0 = sin A0 cos A</> (5.2-28) 

A c A c 

The noise term « 2 (t) is zero-mean Gaussian with a power spectral density Nq/ 2A 2 c . 
Since the noise rc 2 (0 is additive at the input to the loop, the variance of the phase error 
A</>, which is also the variance of the VCO output phase, is 


„2_No_r 

* ~ 2A? L 


\H(f)\ 2 df 




No 

A? 

NoB eq 

~A^~ 


OO 

oo 


(5.2-29) 


where B eq is the (one-sided) equivalent noise bandwidth of the loop, given in Equation 
5.2-22. Note that a ? is simply the ratio of total noise power within the bandwidth of 
the PLL divided by the signal power. Hence, 


al = 


Yl 


(5.2-30) 


where y L is defined as the signal-to-noise ratio 

A 2 

SNR = yL = —j- (5.2-31) 

™ 0 ** eq 


The expression for the variance cr? of the VCO phase error applies to the case where 
the SNR is sufficiently high that the linear model for the PLL applies. An exact analysis 
based on the non-linear PLL is mathematically tractable when G(s) = 1 , which results 
in a first-order loop. In this case, the probability density function for the phase error 
may be derived (see Viterbi, 1966) and has the form 


p(A(p) = 


exp(y L cos A </>) 


(5.2-32) 


2nI 0 (y L ) 

where y L is the SNR given by Equation 5.2-31 with B eq being the appropriate noise 
bandwidth of the first-order loop, and /q(-) is the modified Bessel function of order zero. 
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FIGURE 5.2-8 

Comparison of VCO phase variance for exact and approximate 
(linear model) first-order PLL. [From Principles of Coherent 
Communication, by A. J. Viterbi; ©1966 by McGraw-Hill Book 
Company. Reprinted with permission of the publisher.] 


From the expression for p(Acj)), we may obtain the exact value of the variance for 
the phase error on a first-order PLL. This is plotted in Figure 5.2-8 as a function of 
1 /yl- Also shown for comparison is the result obtained with the linearized PLL model. 
Note that the variance for the linear model is close to the exact variance for y L > 3. 
Hence, the linear model is adequate for practical purposes. 

Approximate analyses of the statistical characteristics of the phase error for the non- 
linear PLL have also been performed. Of particular importance is the transient behavior 
of the PLL during initial acquisition. Another important problem is the behavior of PLL 
at low SNR. It is known, for example, that when the SNR at the input to the PLL drops 
below a certain value, there is a rapid deterioration in the performance of the PLL. 
The loop begins to lose lock and an impulsive type of noise, characterized as clicks, is 
generated which degrades the performance of the loop. Results on these topics can be 
found in the texts by Viterbi (1966), Lindsey (1972), Lindsey and Simon (1973), and 
Gardner (1979), and in the survey papers by Gupta (1975) and Lindsey and Chie (1981). 

Up to this point, we have considered carrier phase estimation when the carrier 
signal is unmodulated. Below, we consider carrier phase recovery when the signal 
carries information. 


5.2^t Decision-Directed Loops 

A problem arises in maximizing either Equation 5.2-9 or 5.2-10 when the signal s(t ; 0) 
carries the information sequence {/„}. In this case we can adopt one of two approaches: 
either we assume that {/„} is known or we treat {/„} as a random sequence and average 
over its statistics. 

In decision-directed parameter estimation, we assume that the information se- 
quence { I „ | over the observation interval has been estimated and, in the absence of 
demodulation errors, /„ = /„, where /„ denotes the detected value of the information 
/„. In this case .v ( r ; 0) is completely known except for the carrier phase. 

To be specific, let us consider the decision-directed phase estimate for the class of 
linear modulation techniques for which the received equivalent low-pass signal may 
be expressed as 

r,(t) = e~ j4> ^2 I„g(t -nT) + z(t ) = si(t)e~ J ' p + z{t) 


(5.2-33) 
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where si(t ) is a known signal if the sequence {/„} is assumed known. The likelihood 
function and corresponding log-likelihood function for the equivalent low-pass signal 
are 


A (0) = C exp 



I To 


ri(t)sT(t)e J<t> dt\ 


A L (0) = Re 


1 

No 



r i(t)s*(t ) dt 



(5.2-34) 

(5.2-35) 


If we substitute for sj(t) in Equation 5.2-35 and assume that the observation interval 
T 0 = KT, where A' is a positive integer, we obtain 


f j ^-1 Hn+1)T 

A t (0) = Re \e>+— K / n(t)g*(t - nT)dt 

{ N 0 „ =0 JnT 


= Re 



K-\ 

E 

n=0 


Ky n 


(5.2-36) 


where, by definition 

f(n+l)T 

yn= I n(t)g*(t-nT)dt (5.2-37) 

JnT 

Note that y n is the output of the matched filter in the nth signal interval. The ML estimate 
of </> is easily found from Equation 5.2-36 by differentiating the log-likelihood 


K - 1 


K - 1 


a l (</>) = Re ( E ^2 Kyn ) cos </> - !m ( E ^2 Kyn 


n = 0 


No 


sint 


n = 0 


with respect to 0 and setting the derivative equal to zero. Thus, we obtain 

/k~i \ / /K - t 


0ml = — tan 


-l 


Im £ Kyn /Re E On 


^ n=0 


V 72=0 


(5.2-38) 


We call 0 ml in Equation 5.2-38 the decision-directed (or decision-feedback) carrier 
phase estimate. It is easily shown (Problem 5.10) that the mean value of 0 ml is 0, so that 
the estimate is unbiased. Furthermore, the PDF of 0 ml can be obtained (Problem 5.11) 
by using the procedure described in Section 4.3-2. 

The block diagram of a double-sideband PAM signal receiver that incorporates 
the decision-directed carrier phase estimate given by Equation 5.2-38 is illustrated in 
Figure 5.2-9. 

Another implementation of the PAM receiver that employs a decision-feedback 
PLL (DFPLL) for carrier phase estimation is shown in Figure 5.2-10. The received 
double-sideband PAM signal is given by A(t) cos(2tt f c t + 0), where A(t) = A m g(t) 
and g(t) is assumed to be a rectangular pulse of duration T. This received signal is 
multiplied by the quadrature carriers ef t) and c q (t), as given by Equation 5.2-5, which 
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FIGURE 5.2-9 

Block diagram of double-sideband PAM signal receiver with decision-directed carrier phase 
estimation. 


are derived from the VCO. The product signal 

r(t) cos(2n f c t + </>)= + m(t )] cos A 0 

— j n q(t) sin At/) + double-frequency terms 

is used to recover the information carried by A(t). The detector makes a decision on 
the symbol that is received every T seconds. Thus, in the absence of decision errors, 
it reconstructs A(t) free of any noise. This reconstructed signal is used to multiply the 
product of the second quadrature multiplier, which has been delayed by T seconds 
to allow the demodulator to reach a decision. Thus, the input to the loop filter in the 
absence of decision errors is the error signal 


e(t ) = \ A(f){[A(f) + n,j(t)\ sin A 0 — n q (t) cos A0} 

+ double-frequency terms 

= \A 2 (t) sin A0 + ^A(t)[rij(t) sin A 0 — n q (t) cos A(f>] 
+ double-frequency terms 


( 5 . 2 - 40 ) 



FIGURE 5.2-10 

Carrier recovery with a decision-feedback PLL. 
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FIGURE 5.2-11 

Block diagram of QAM signal receiver with decision-directed carrier phase estimation. 

The loop filter is low-pass and, hence, it rejects the double -frequency term in e(f). 
The desired component is A 2 (t) sin A 0, which contains the phase error for driving the 
loop. 

The ML estimate in Equation 5.2-38 is also appropriate for QAM. The block dia- 
gram of a QAM receiver that incorporates the decision-directed carrier phase estimate 
is shown in Figure 5.2-1 1. 

In the case of M- ary PSK, the DFPLL has the configuration shown in Figure 5.2-1 2. 
The received signal is demodulated to yield the phase estimate 

„ 2n 

d m = - 1) 

M 

which, in the absence of a decision error, is the transmitted signal phase 9 m . The 
two outputs of the quadrature multipliers are delayed by the symbol duration T and 
multiplied by cos 6 m and sin 9 m to yield 

r(f) cos(27T f c t + 0) sin 6 m 

= \[A cos 6 m + «,(/)] sin#„, cos(0 — 0) 

- \[A sin 0 m + n q {t)] sin6> m sin(0 - 0) 

+ double-frequency terms (5.2-41) 

r(t) sin(27r/ c f + 0) cos 6 m 

= — \ [A cos 0 m + «,(/)] cos 0 m sin(0 — 0) 

— \\A sin d m + n q (t)] cos d m cos(0 — 0) 

+ double-frequency terms 
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FIGURE 5.2-12 

Carrier recovery for M - ary PSK using a decision-feedback PLL. 


The two signals are added to generate the error signal 

e(t) = -\A sin (0 - 0) + in,(/)sin(0 - 0 - 9 m ) (5.2^12) 

+ |« ? (/)cos(0 — 0 — 9 m ) + double-frequency terms 

This error signal is the input to the loop filter that provides the control signal for the 

vco. 

We observe that the two quadrature noise components in Equation 5.2-42 appear 
as additive terms. There is no term involving a product of two noise components as 
in an M tli-powcr law device, described in the next section. Consequently, there is no 
additional power loss associated with the decision-feedback PLL. 

This /W- phase tracking loop has a phase ambiguity of 360 ° / M, necessitating the 
need to differentially encode the information sequence prior to transmission and differ- 
entially decode the received sequence after demodulation to recover the information. 

The ML estimate in Equation 5.2-38 is also appropriate for QAM. The ML estimate 
for offset QPS K is also easily obtained (Problem 5 . 1 2) by maximizing the log-likelihood 
function in Equation 5.2-35, with .v/(?) given as 

si(f) = £„ I n g(t - nT) + j £„ J n g(t - nT - \T) (5.2-43) 


where /„ = ±1 and J„ = ±1. 
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Finally, we should also mention that carrier phase recovery for CPM signals can 
also be accomplished in a decision-directed manner by use of a PLL. From the optimum 
demodulator for CPM signals, which is described in Section 4.3, we can generate an 
error signal that is filtered in a loop filter whose output drives a PLL. Alternatively, we 
may exploit the linear representation of CPM signals and, thus, employ a generalization 
of the carrier phase estimator given by Equation 5.2-38, in which the cross correlation 
of the received signal is performed with each of the pulses in the linear representation. 
A comprehensive description of carrier phase recover techniques for CPM is given in 
the book by Mengali and D’ Andrea (1997). 


5.2-5 Non-Decision-Directed Loops 


Instead of using a decision-directed scheme to obtain the phase estimate, we may treat 
the data as random variables and simply average A(</>) over these random variables 
prior to maximization. In order to carry out this integration, we may use either the 
actual probability distribution function of the data, if it is known, or, perhaps, we may 
assume some probability distribution that might be a reasonable approximation to the 
true distribution. The following example illustrates the first approach. 

example 5.2-2. Suppose the real signal s(t) carries binary modulation. Then, in a 
signal interval, we have 

s(t) — A cos 2 Tif c t, 0 < t < T 


where A = ± 1 with equal probability. Clearly, the PDF of A is given as 
p(A) = \8{A - 1) + \8{A + 1) 

Now, the likelihood function A(</>) given by Equation 5.2-9 may be considered as 
conditional on a given value of A and must be averaged over the two values. Thus, 


A (4>) = 


= \ ex P 


A (<p) p(A) d A 
2 r T 


No 


'0 


r(t ) cos(27 rf c t + <p)dt 

rT 


+ 2 exp 
= cosh 

and the corresponding log-likelihood function is 
A i(<p) = In cosh 


r(t) cos(27r f c t + <p)dt 


No Jo 


— [ 

No Jo 

C T 

r(t) cos(27T f c t + 0) dt 


2 f T 

— / r(t) cos{2ir f c t + <p) dt 


No 


(5.2-44) 


If we differentiate A i((p) and set the derivative equal to zero, we obtain the ML estimate 
for the non-decision-directed estimate. Unfortunately, the functional relationship in 
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Equation 5.2-44 is highly non-linear and, hence, an exact solution is difficult to obtain. 
On the other hand, approximations are possible. In particular, 


In cosh x 



A « 1) 
( 1*1 » 1 ) 


(5.2-45) 


With these approximations, the solution for (p becomes tractable. 

In this example, we averaged over the two possible values of the information 
symbol. When the information symbols are M -valued, where M is large, the averaging 
operation yields highly non-linear functions of the parameter to be estimated. In such 
a case, we may simplify the problem by assuming that the information symbols are 
continuous random variables. For examples, we may assume that the symbols are zero- 
mean Gaussian. The following example illustrates this approximation and the resulting 
form for the average likelihood function. 

example 5.2-3. Let us consider the same signal as in Example 5.2-2, but now we 
assume that the amplitude A is zero-mean Gaussian with unit variance. Thus, 


If we average A ( </; ) over the assumed PDF of A, we obtain the average likelihood A ( </> ) 
in the form 


, 2 ’ 


A(</>) = C exp • 

[ |_Ao Jo 

and the corresponding log-likelihood as 


r(t ) cos(2 7r f c t + (p) dt 


A l(<P) = 


— / r(t) cos(2tt f c t + (p)dt 
Nq Jo 


(5.2^16) 


(5.2-47) 


We can obtain the ML estimate of <p by differentiating A /,(</>) and setting the derivative 
to zero. 


It is interesting to note that the log-likelihood function is quadratic under the Gaus- 
sian assumption and that it is approximately quadratic, as indicated in Equation 5.2^15 
for small values of the cross correlation of r(t) with ,s(t: <p). In other words, if the cross 
correlation over a single interval is small, the Gaussian assumption for the distribution 
of the information symbols yields a good approximation to the log-likelihood function. 

In view of these results, we may use the Gaussian approximation on all the symbols 
in the observation interval To = KT. Specifically, we assume that the K information 
symbols are statistically independent and identically distributed. By averaging the like- 
lihood function A (0) over the Gaussian PDF for each of the K symbols in the interval 
To = K T. wc obtain the result 


' K - 1 


A (<p) = C exp 


r(n+l)T 


n = 0 


LM) 


r(t ) cos(2nf c t + (p) dt 


’ nT 


(5.2-48) 
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FIGURE 5.2-13 

Non-decision-directed PLL for carrier phase estimation of PAM signals. 


If we take the logarithm of Equation 5 .2-48 , differentiate the resulting log-likelihood 
function, and set the derivative equal to zero, we obtain the condition for the ML esti- 
mate as 

K ~ l Mn+l)T ^ r(n+V)T 

/ r (t) cos(2n f c t + 0) dt / r(t) sm(2nf c t + 0) dt = 0 (5.2-49) 

„= o JnT J " T 

Although this equation can be manipulated further, its present form suggests the tracking 
loop configuration illustrated in Figure 5.2-13. This loop resembles a Costas loop, 
which is described below. We note that the multiplication of the two signals from the 
integrators destroys the sign carried by the information symbols. The summer plays the 
role of the loop filter. In a tracking loop configuration, the summer may be implemented 
either as a sliding-window digital filter (summer) or as a low-pass digital filter with 
exponential weighting of the past data. 

In a similar manner, one can derive non-decision-directed ML phase estimates for 
QAM and M-PSK. The starting point is to average the likelihood function given by 
Equation 5.2-9 over the statistical characteristics of the data. Here again, we may use the 
Gaussian approximation (two-dimensional Gaussian for complex-valued information 
symbols) in averaging over the information sequence. 

Squaring loop The squaring loop is a non-decision-directed loop that is widely 
used in practice to establish the carrier phase of double-sideband suppressed carrier 
signals such as PAM. To describe its operation, consider the problem of estimating the 
earner phase of the digitally modulated PAM signal of the form 

s{t) = A{t) cos{2nf c t + <p) (5.2-50) 

where A{t) carries the digital information. Note that £’ [,rin| = E\ A(t)\ = 0 when the 
signal levels are symmetric about zero. Consequently, the average value of s(t) does 
not produce any phase coherent frequency components at any frequency, including 
the carrier. One method for generating a carrier from the received signal is to square 
the signal and, thus, to generate a frequency component at 2f c , which can be used to 
drive a PLL tuned to 2 f c . This method is illustrated in the block diagram shown in 
Figure 5.2-14. 
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demodulator Frequency 


divider 


FIGURE 5.2-14 

Carrier recover using a square-law device. 


The output of the square-law device is 


s 2 (t) = A 2 {t) cos 2 (2n f c t + 0) (5.2-5 1) 

= \ A 2 {t) + \A 2 (t) cos(47 rf c t + 20) 

Since the modulation is a cyclostationary stochastic process, the expected value of 
s 2 (t ) is 


£[s 2 (f)] = \E[A 2 (t)} + \E[A 2 {t)} cos(47r/ f t + 20) (5.2-52) 

Hence, there is power at the frequency 2 f c . 

If the output of the square-law device is passed through a band-pass filter tuned to the 
double-frequency term in Equation 5.2-5 1 , the mean value of the filter is a sinusoid with 
frequency 2 f c , phase 20, and amplitude \ E[A 2 (t)]H(2f c ), where H(2f c ) is the gain of 
the filter at f = 2 f c . Thus, the square-law device has produced a periodic component 
from the input signal s(t). In effect, the squaring of .v(f) has removed the sign information 
contained in A(t) and, thus, has resulted in phase-coherent frequency components at 
twice the carrier. The filtered frequency component at 2 f c is then used to drive the PLL. 

The squaring operation leads to a noise enhancement that increases the noise power 
level at the input to the PLL and results in an increase in the variance of the phase error. 

To elaborate on this point, let the input to the squarer be s(t) + n(t), where ,v(f ) is 
given by Equation 5.2-50 and n(t ) represents the band-pass additive Gaussian noise 
process. By squaring s(t) + n(t), we obtain 

y(t) = s 2 {t ) + 2 s{t)n{t) + n 2 (t) (5.2-53) 

where s 2 (t) is the desired signal component and the other two components are the sig- 
nal x noise and noise x noise terms. By computing the autocorrelation functions and 
power density spectra of these two noise components, one can easily show that both 
components have spectral power in the frequency band centered at 2 f c . Consequently, 
the band-pass filter with bandwidth /f hp centered at 2 f c , which produces the desired si- 
nusoidal signal component that drives the PLL, also passes noise due to these two terms. 
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Since the bandwidth of the loop is designed to be significantly smaller than the 
bandwidth B bp of the band-pass filter, the total noise spectrum at the input to the PLL 
may be approximated as a constant within the loop bandwidth. This approximation 
allows us to obtain a simple expression for the variance of the phase error as 



1 

YlSl 


where S/ is called the squaring loss and is given by 


Sl 



5bp/2B e q\ * 

YL ) 


(5.2-54) 


(5.2-55) 


Since Sl < 1 » S L 1 represents the increase in the variance of the phase error caused by 
the added noise (noise x noise terms) that results from the squarer. Note, for example, 
that when y L = B\, p /2B eq , the loss is 3 dB. 

Finally, we observe that the output of the VCO from the squaring loop must be 
frequency-divided by 2 to generate the phase-locked carrier for signal demodulation. 
It should be noted that the output of the frequency divider has a phase ambiguity of 
180° relative to the phase of the received signal. For this reason, the data must be 
differentially encoded prior to transmission and differentially decoded at the receiver. 


Costas loop Another method for generating a properly phased carrier for a double- 
sideband suppressed carrier signal is illustrated by the block diagram shown in 
Figure 5.2-15. This scheme was developed by Costas (1956) and is called the Costas 
loop. The received signal is multiplied by cos( 27T f c t + 0) and si n(27r/) r + 0), which 
are outputs from the VCO. The two products are 

y c (t) = [.S’(f) + 72(f)] COS(27T f c t + 0) 

= \[A(t) + 77,(0] cos A 0 + \n q (t) sin A0 

+ double-frequency terms (5.2-56) 

y s (t) = b(f) + 77(f)] sin(27r/ c f + 0) 

= i[A(f) + 77; (f)] sin A0 — i?2 9 (f)cOSA0 
+ double-frequency terms 



FIGURE 5.2-15 

Block diagram of Costas loop. 
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where the phase error A 0 = (j> — <p. The double-frequency terms are eliminated by the 
low-pass biters following the multiplications. 

An error signal is generated by multiplying the two outputs of the low-pass biters. 
Thus, 

e(0 = |{[A(r) + n,-(f)] 2 - n 2 (t)} sin(2A </>) 

(5.2-57) 

- \n q (t)\_A{t) + nft)] cos(2A0) 

This error signal is bltered by the loop biter, whose output is the control voltage that 
drives the VCO. The reader should note the similarity of the Costas loop to the PLL 
shown in Figure 5.2-13. 

We note that the error signal into the loop biter consists of the desired term 
A 2 (t) sin 2(0 — (j ) ) plus terms that involve signal x noise and noise x noise. These 
terms are similar to the two noise terms at the input to the PLL for the squaring method. 
In fact, if the loop biter in the Costas loop is identical to that used in the squaring loop, 
the two loops are equivalent. Under this condition, the probability density function of 
the phase error and the performance of the two loops are identical. 

It is interesting to note that the optimum low-pass biter for rejecting the double- 
frequency terms in the Costas loop is a biter matched to the signal pulse in the 
information-bearing signal. If matched biters are employed for the low-pass biters, 
their outputs could be sampled at the bit rate at the end of each signal interval, and the 
discrete-time signal samples could be used to drive the loop. The use of the matched 
biter results in a smaller noise into the loop. 

Finally, we note that, as in the squaring PLL, the output of the VCO contains a 
phase ambiguity of 180°, necessitating the need for differential encoding of the data 
prior to transmission and differential decoding at the demodulator. 


Carrier estimation for multiple phase signals When the digital information is 
transmitted via M -phase modulation of a carrier, the methods described above can 
be generalized to provide the properly phased carrier for demodulation. The received 
M- phase signal, excluding the additive noise, may be expressed as 


s{t) = A cos 


fct + <p + 




m = 1, 2, . . . , M (5.2-58) 


where 2n(m — 1 )/M represents the information-bearing component of the signal phase. 
The problem in carrier recovery is to remove the information-bearing component and, 
thus, to obtain the unmodulated carrier cos(2itf c t + (j)). One method by which this 
can be accomplished is illustrated in Figure 5.2-16, which represents a generalization 
of the squaring loop. The signal is passed through an Mth-power-law device, which 
generates a number of harmonics of f c . The band-pass biter selects the harmonic 
cos(27r Mf c t + Mcj)) for driving the PLL. The term 

2: r 

— (m — 1 )M = 2jv(m — 1) = 0 (mod27r), m = 1, 2, . . . , M 

M 

Thus, the information is removed. The VCO output is sin(2nMf c t + Mf), so this 
output is divided in frequency by M to yield sm(2nf c t + 0), and phase-shifted by \it 
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Output 


FIGURE 5.2-16 

Carrier recovery with Mth-power-law device for M-ary PSK. 


rad to yield cos( 27T f c t+<j>). These components are then fed to the demodulator. Although 
not explicitly shown, there is a phase ambiguity in these reference sinusoids of 360 °/M, 
which can be overcome by differential encoding of the data at the transmitter and 
differential decoding after demodulation at the receiver. 

Just as in the case of the squaring PLL, the Mth-power PLL operates in the presence 
of noise that has been enhanced by the Mth-power-law device, which results in the 
output 

y(t) = [s(f) + n(t)] M 


The variance of the phase error in the PLL resulting from the additive noise may be 
expressed in the simple form 




Yl 


(5.2-59) 


where y L is the loop SNR and S^l is the M-phase power loss. S ML has been evaluated 
by Lindsey and Simon (1973) for M = 4 and 8. 

Another method for carrier recovery in M- ary PSK is based on a generalization 
of the Costas loop. That method requires multiplying the received signal by M phase- 
shifted carriers of the form 


sin 


27 T f c t + 0 + 




k = 1,2, M 


low-pass-filtering each product, and then multiplying the outputs of the low-pass biters 
to generate the error signal. The error signal excites the loop biter, which, in turn, 
provides the control signal for the VCO. This method is relatively complex to implement 
and, consequently, has not been generally used in practice. 


Comparison of decision-directed with non-decision-directed loops We note that 
the decision-feedback phase-locked loop (DFPLL) differs from the Costas loop only in 
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the method by which A(t ) is rectified for the purpose of removing the modulation. In 
the Costas loop, each of the two quadrature signals used to rectify A(t) is corrupted by 
noise. In the DFPLL, only one of the signals used to rectify A{t) is corrupted by noise. 
On the other hand, the squaring loop is similar to the Costas loop in terms of the noise 
effect on the estimate 0. Consequently, the DFPLL is superior in performance to both 
the Costas loop and the squaring loop, provided that the demodulator is operating at 
error rates below 10“ 2 where an occasional decision error has a negligible effect on (j>. 
Quantitative comparisons of the variance of the phase errors in a Costas loop to those 
in DFPLL have been made by Lindsey and Simon (1973), and show that the variance 
of the DFPLL is 4-10 times smaller for signal-to-noise ratios per bit above 0 dB. 


■ 5.3 

SYMBOL TIMING ESTIMATION 

In a digital communication system, the output of the demodulator must be sampled 
periodically at the symbol rate, at the precise sampling time instants t m = mT+r, where 
T is the symbol interval and r is a nominal time delay that accounts for the propagation 
time of the signal from the transmitter to the receiver. To perform this periodic sampling, 
we require a clock signal at the receiver. The process of extracting such a clock signal 
at the receiver is usually called symbol synchronization or timing recovery. 

Timing recovery is one of the most critical functions that is performed at the receiver 
of a synchronous digital communication system. We should note that the receiver must 
know not only the frequency (1 /T) at which the outputs of the matched biters or 
correlators are sampled, but also where to take the samples within each symbol interval. 
The choice of sampling instant within the symbol interval of duration T is called the 
timing phase. 

Symbol synchronization can be accomplished in one of several ways. In some 
communication systems, the transmitter and receiver clocks are synchronized to a 
master clock, which provides a very precise timing signal. In this case, the receiver 
must estimate and compensate for the relative time delay between the transmitted and 
received signals. Such may be the case for radio communication systems that operate 
in the very low frequency (VLF) band (below 30 kHz), where precise clock signals are 
transmitted from a master radio station. 

Another method for achieving symbol synchronization is for the transmitter to 
simultaneously transmit the clock frequency 1 /T or a multiple of 1 /T along with 
the information signal. The receiver may simply employ a narrowband biter tuned to 
the transmitted clock frequency and, thus, extract the clock signal for sampling. This 
approach has the advantage of being simple to implement. There are several disadvan- 
tages, however. One is that the transmitter must allocate some of its available power to 
the transmission of the clock signal. Another is that some small fraction of the available 
channel bandwidth must be allocated for the transmission of the clock signal. In spite 
of these disadvantages, this method is frequently used in telephone transmission sys- 
tems that employ large bandwidths to transmit the signals of many users. In such a case, 
the transmission of a clock signal is shared in the demodulation of the signals among 
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the many users. Through this shared use of the clock signal, the penalty in the transmitter 
power and in bandwidth allocation is reduced proportionally by the number of users. 

A clock signal can also be extracted from the received data signal. There are a num- 
ber of different methods that can be used at the receiver to achieve self-synchronization. 
In this section, we treat both decision-directed and non-decision-directed methods. 


5.3-1 Maximum-Likelihood Timing Estimation 


Let us begin by obtaining the ML estimate of the time delay r . If the signal is a baseband 
PAM waveform, it is represented as 

r(t) = s(t\ x) + n(t) (5.3-1) 


where 

s(t; r) = ^2 ~nT - x) (5.3-2) 

n 


As in the case of ML phase estimation, we distinguish between two types of timing 
estimators, decision-directed timing estimators and non-decision-directed estimators. 
In the former, the information symbols from the output of the demodulator are treated as 
the known transmitted sequence. In this case, the log-likelihood function has the form 


Ai(r )=Cl / r(t)s(f, r) dt 
Jt 0 

If we substitute Equation 5.3-2 into Equation 5.3-3, we obtain 


A t (r) = C l J 2 In 

n 


' To 


r(t)g(t — nT — x )dt 


= C L ^I n y n {x) 

n 


(5.3-3) 


(5.3-4) 


where y„(t) is defined as 


y n (r)= I r(t)g(t — nT — x)dt 
Jt 0 


A necessary condition for r to be the ML estimate of r is that 

(t) v — > d f 

= V ' L — / r(t)g(t - nT - r) dt 
V dr Jt 0 


d x 


= ^4^[>’„( t )] = 0 


(5.3-5) 


(5.3-6) 


The result in Equation 5.3-6 suggests the implementation of the tracking loop 
shown in Figure 5.3-1. We should observe that the summation in the loop serves as 
the loop filter whose bandwidth is controlled by the length of the sliding window in 
the summation. The output of the loop filter drives the voltage-controlled clock (VCC), 
or voltage-controlled oscillator, which controls the sampling times for the input to the 
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i„ 



FIGURE 5.3-1 

Decision-directed ML estimation of timing for baseband PAM. 

loop. Since the detected information sequence {/„} is used in the estimation of r, the 
estimate is decision-directed. 

The technique described above for ML timing estimation of baseband PAM signals 
can be extended to carrier modulated signal formats such as QAM and PSK in a 
straightforward manner, by dealing with the equivalent low-pass form of the signals. 
Thus, the problem of ML estimation of symbol timing for carrier signals is very similar 
to the problem formulation for the baseband PAM signal. 

5.3-2 Non-Decision-Directed Timing Estimation 

A non-decision-directed timing estimate can be obtained by averaging the likelihood ra- 
tio A(r) over the PDF of the information symbols, to obtain A(r), and then differentiat- 
ing either A(r) or In A(r) = A / ( r ) to obtain the condition for the maximum-likelihood 
estimate xml- 

In the case of binary (baseband) PAM, where I n = ± 1 with equal probability, the 
average over the data yields 

A l (t) = lncosh[Cy„(r)] (5.3-7) 

n 

just as in the case of the phase estimator, Since In cosh x ~ \x 2 for small x, the 
square-law approximation 

A t (T)»|C 2 E B ^(T) (5-3-8) 

is appropriate for low signal-to-noise ratios. For multilevel PAM, we may approximate 
the statistical characteristics of the information symbols {/„} by the Gaussian PDF, 
with zero-mean and unit variance. When we average A(r) over the Gaussian PDF, the 
logarithm of A(r) is identical to A / ( r ) given by Equation 5.3-8. Consequently, the 
non-decision-directed estimate of r may be obtained by differentiating Equation 5.3-8. 
The result is an approximation to the ML estimate of the delay time. The derivative of 
Equation 5.3-8 is 

£E*t>-2£*<t)^-0 (5-3-9, 


where y„(x) is given by Equation 5.3-5. 
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FIGURE 5.3-2 

Non-decision-directed estimation of timing for binary baseband PAM. 


An implementation of a tracking loop based on the derivative of A l (t) given 
by Equation 5.3-7 is shown in Figure 5.3-2. Alternatively, an implementation of a 
tracking loop based on Equation 5.3-9 is illustrated in Figure 5.3-3. In both structures, 
we observe that the summation serves as the loop filter that drives the VCC. It is 
interesting to note the resemblance of the timing loop in Figure 5.3-3 to the Costas 
loop for phase estimation. 

Early-late gate synchronizers Another non-decision-directed timing estimator 
exploits the symmetry properties of the signal at the output of the matched filter or 
correlator. To describe this method, let usconsider the rectangular pulse i(t),0 < t < T, 
shown in Figure 5.3-4a. The output of the filter matched to ,s(t) attains its maximum 
value at time t = T , as shown in Figure 5.3-4b. Thus, the output of the matched filter 
is the time autocorrelation function of the pulse s(t). Of course, this statement holds 
for any arbitrary pulse shape, so the approach that we describe applies in general to 
any signal pulse. Clearly, the proper time to sample the output of the matched filter for 
a maximum output is at t = T, i.e., at the peak of the correlation function. 

In the presence of noise, the identification of the peak value of the signal is generally 
difficult. Instead of sampling the signal at the peak, suppose we sample early, at t = T—S 
and late at t = T + 5. The absolute values of the early samples \y[m(T — <5)] | and the 
late samples \y[m(T + <5)]| will be smaller (on the average in the presence of noise) 



FIGURE 5.3-3 

Non-decision-directed estimation of timing for baseband PAM. 
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FIGURE 5.3-4 

Rectangular signal pulse (a) and its 
matched filter output (b). 


than the samples of the peak value \y(mT)\. Since the autocorrelation function is even 
with respect to the optimum sampling time t = T, the absolute values of the correlation 
function at t = T —8 and t = T +8 are equal. Under this condition, the proper sampling 
time is the midpoint between t = T — 8 and t = T + 8. This condition forms the basis 
for the early— late gate symbol synchronizer. 

Figure 5.3-5 illustrates the block diagram of an early-late gate synchronizer. In 
this figure, correlators are used in place of the equivalent matched iilters. The two 
correlators integrate over the symbol interval T, but one correlator starts integrating 
8 seconds early relative to the estimated optimum sampling time and the other in- 
tegrator starts integrating <5 seconds late relative to the estimated optimum sampling 
time. An error signal is formed by taking the difference between the absolute values 
of the two correlator outputs. To smooth the noise corrupting the signal samples, the 
error signal is passed through a low-pass filter. If the timing is off relative to the op- 
timum sampling time, the average error signal at the output of the low-pass filter is 
nonzero, and the clock signal is either retarded or advanced, depending on the sign 
of the error. Thus, the smoothed error signal is used to drive a VCC, whose output 
is the desired clock signal that is used for sampling. The output of the VCC is also 
used as a clock signal for a symbol waveform generator that puts out the same basic 
pulse waveform as that of the transmitting filter. This pulse waveform is advanced and 
delayed and then fed to the two correlators, as shown in Figure 5.3-5. Note that if the 
signal pulses are rectangular, there is no need for a signal pulse generator within the 
tracking loop. 



FIGURE 5.3-5 

Block diagram of early-late gate synchronizer. 
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We observe that the early-late gate synchronizer is basically a closed-loop control 
system whose bandwidth is relatively narrow compared to the symbol rate l/T. The 
bandwidth of the loop determines the quality of the timing estimate. A narrowband loop 
provides more averaging over the additive noise and, thus, improves the quality of the 
estimated sampling instants, provided that the channel propagation delay is constant 
and the clock oscillator at the transmitter is not drifting with time (or drifting very 
slowly with time). On the other hand, if the channel propagation delay is changing 
with time and/or the transmitter clock is also drifting with time, then the bandwidth of 
the loop must be increased to provide for faster tracking of time variations in symbol 
timing. 

In the tracking mode, the two correlators are affected by adjacent symbols. How- 
ever, if the sequence of information symbols has zero-mean, as is the case for PAM and 
some other signal modulations, the contribution to the output of the correlators from 
adjacent symbols averages out to zero in the low-pass filter. 

An equivalent realization of the early-late gate synchr onizer that is somewhat easier 
to implement is shown in Figure 5.3-6. In this case the clock signal from the VCC is 
advanced and delayed by 8, and these clock signals are used to sample the outputs of 
the two correlators. 

The early-late gate synchronizer described above is a non-decision-directed es- 
timator of symbol timing that approximates the maximum-likelihood estimator. This 
assertion can be demonstrated by approximating the derivative of the log-likelihood 
function by the finite difference, i.e., 


dA L (r) ^ A L (r + 8) - A L (r - 8) 
dr 28 


(5.3-10) 



FIGURE 5.3-6 

Block diagram of early-late gate synchronizer — an alternative form. 
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If we substitute for A / ( r ) from Equation 5.3-8 into Equation 5.3-10, we obtain the 
approximation for the derivative as 




dr 45 
C 2 


45^ 


-i 2 


1 To 


r(t)g(t — nT — r — 8) dt 


(5.3-11) 


-i 2 


I To 


r(t)g(t — nT — x + 8) dt 


But the mathematical expression in Equation 5.3-1 1 basically describes the functions 
performed by the early-late gate symbol synchronizers illustrated in Figures 5.3-5 
and 5.3-6. 


■ 5.4 

JOINT ESTIMATION OF CARRIER PHASE AND SYMBOL TIMING 


The estimation of the carrier phase and symbol timing may be accomplished separately 
as described above or jointly. Joint ML estimation of two or more signal parameters 
yields estimates that are as good and usually better than the estimates obtained from 
separate optimization of the likelihood function. In other words, the variances of the 
signal parameters obtained from joint optimization are less than or equal to the variance 
of parameter estimates obtained from separately optimizing the likelihood function. 

Let us consider the joint estimation of the carrier phase and symbol timing. The 
log-likelihood function for these two parameters may be expressed in terms of the 
equivalent low-pass signals as 


A l ( 0, r) = Re 


1 

_N~o 



0, x )dt 


(5.4-1) 


where si(t; 0, r) is the equivalent low-pass signal, which has the general form 


s/(f; 0, r) = e l<l> 


Y ~ nT ~ T ) + J Y JnW ( { ~ nT ~ T ) 

n n 


(5.4-2) 


where {/„} and { J „ } are the two information sequences. 

We note that, for PAM, we may set J n = 0 for all n, and the sequence {/„} is real. 
For QAM and PSK, we set J n = 0 for all n and the sequence {/„} is complex-valued. 
For offset QPSK, both sequences {/„} and {7,,} are nonzero and w(t) = g(t — \T). 

For decision-directed ML estimation of 0 and r, the log-likelihood function 
becomes 

f e J<l> ^ 1 

| E YyniT) - jJZxJx)} | 


A l ( 0, r) = Re 


(5.4-3) 
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where 


y,M = I 

^ r(t)g*(t — nT — r )dt 


•11 0 

(5.4-4) 

x„(r) = 1 

r(t)w*(t — nT — r)dt 


Jt 0 



Necessary conditions for the estimates of 0 and r to be the ML estimates are 

3A L (0, r) _ 3A L (0, r) _ 

30 “ ’ 3r 


(5.4-5) 


It is convenient to define 

A(r) + jB( x) = -J- V [/„*y„(T) - yV>„(T)] (5.4-6) 

A o 

With this definition, Equation 5.4—3 may be expressed in the simple form 

A l( 0, r) = A(t) cos 0 — B{x) sin 0 (5.4-7) 


Now the conditions in Equation 5.4—5 for the joint ML estimates become 


3A l ( 0, r) 


30 

3A l (0,t) = 3A(r) 
3r 3r 

From Equation 5.4-8, we obtain 


= —A(x) sin 0 — 5(r ) cos 0 = 0 


3fi(r) 

cos 0 sin 0 = 0 

3r 


0ml = - tan 1 


-6(r ML ) 


. A(tml). 

The solution to Equation 5.4-9 that incorporates Equation 5.4-10 is 


3A(r) dB(r) 

A(r) — 1- fi(r) 


3r 


3r 


= 0 


(5.4-8) 

(5.4-9) 


(5.4-10) 


(5.4-11) 


The decision-directed tracking loop for QAM (or PSK) obtained from these equa- 
tions is illustrated in Figure 5.4—1. 

Offset QPSK requires a slightly more complex structure for joint estimation of 0 
and r. The structure is easily derived from Equations 5.4-6 to 5.4-1 1. 

In addition to the joint estimates given above, it is also possible to derive non- 
decision-directed estimates of the carrier phase and symbol timing, although we shall 
not pursue this approach. 

We should also mention that one can combine the parameter estimation problem 
with the demodulation of the information sequence {/„}. Thus, one can consider the 
joint maximum-likelihood estimation of {/„ }, the carrier phase 0, and the symbol timing 
parameter r. Results on these joint estimation problems have appeared in the technical 
literature, e.g., Kobayashi (1971), Falconer (1976), and Falconer and Salz (1977). 
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FIGURE 5.4-1 

Decision-directed joint tracking loop for carrier phase and symbol timing in QAM and PSK. 


5.5 

PERFORMANCE CHARACTERISTICS OF ML ESTIMATORS 


The quality of a signal parameter estimate is usually measured in terms of its bias 
and its variance. In order to define these terms, let us assume that we have a sequence 
of observations (x\ xi xj ■ ■ ■ x„) = x, with PDF p(x\(j>), from which we extract an 
estimate of a parameter 0. The bias of an estimate, say 0 ( jc), is defined as 

bias = £[0( jc)] - 0 (5.5-1) 

where 0 is the true value of the parameter. When £[0( jc )] = 0, we say that the estimate 
is unbiased. The variance of the estimate 0(x) is defined as 

a\ = E{[j>(x)] 2 } - {£[0 (jc )]} 2 (5.5-2) 


In general rr? may be difficult to compute. However, a well-known result in pa- 
rameter estimation (see Helstrom, 1968) is the Cramer-Rao lower bound on the mean 
square error defined as 


E{Mx) - 0] 2 } > |^-£[0(jc)] 
(90 



' 9 

90 


lnp(x\(/)) 


(5.5-3) 


Note that when the estimate is unbiased, the numerator of Equation 5.5-3 is unity and 
the bound becomes a lower bound on the variance of err of the estimate 0(jc), i.e., 


a?>l 



' 9 

90 


In p(x\4>) 


(5.5-4) 
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Since In p { jc |0) differs from the log-likelihood function by a constant factor inde- 
pendent of 0, it follows that 


-i 2 


30 


In p(x\<p) 


= E 


-i 2 


30 


In A(0) 


= -El w _ t nA W 


Therefore, the lower bound on the variance is 


at > 1 


' 3 

30 


In A(0) 


= -1 


30 : 


In A (0) 


(5.5-5) 


(5.5-6) 


This lower bound is a very useful result. It provides a benchmark for comparing 
the variance of any practical estimate to the lower bound. Any estimate that is unbiased 
and whose variance attains the lower bound is called an efficient estimate. 

In general, efficient estimates are rare. When they exist, they are maximum- 
likelihood estimates. A well-known result from parameter estimation theory is that 
any ML parameter estimate is asymptotically (arbitrarily large number of observa- 
tions) unbiased and efficient. To a large extent, these desirable properties constitute the 
importance of ML parameter estimates. It is also known that an ML estimate is asymp- 
totically Gaussian distributed (with mean 0 and variance equal to the lower bound given 
by Equation 5.5-6.) 

In the case of the ML estimates described in this chapter for the two signal param- 
eters, their variance is generally inversely proportional to the signal-to-noise ratio, or, 
equivalently, inversely proportional to the signal power multiplied by the observation 
interval Tq. Furthermore, the variance of the decision-directed estimates, at low error 
probabilities, are generally lower than the variance of non-decision-directed estimates. 
In fact, the performance of the ML decision-directed estimates for 0 and r attain the 
lower bound. 

The following example is concerned with the evaluation of the Cramer-Rao lower 
bound for the ML estimate of the carrier phase. 

example 5 . 5 - 1 . The ML estimate of the phase of an unmodulated carrier was shown 

in Equation 5.2-1 1 to satisfy the condition 


where 


r(t) sin(27r/ c f + 0 M l) dt = 0 


r(t) = s(t; 0) + n(t) 

— A cos(27r f c t + 0) + n(t) 


(5.5-7) 


(5.5-8) 


The condition in Equation 5.5-7 was derived by maximizing the log-likelihood function 


A L (0) 


— / r(t)s(t; 4>)dt 
No J To 


(5.5-9) 
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The variance of 0 ml is lower-bounded as 

a lm. -\jf j £ '[ r (O]cos(2^/ f r + 0)c/r| 

-\n 0 J To J a 2 To 

^ No/ T q _ NoB eq 
~ A 2 ~ A 2 


(5.5-10) 


where the factor 1 / 7o is simply the (one-sided) equivalent noise bandwidth of the ideal 
integrator and NoB eq is the total noise power. 

From this example, we observe that the variance of the ML phase estimate is 
lower-bounded as 



1 

Yl 


(5.5-11) 


where y L is the loop SNR. This is also the variance obtained for the phase estimate from 
a PLL with decision-directed estimation. As we have already observed, non-decision- 
directed estimates do not perform as well due to losses in the non-linearities required 
to remove the modulation, e.g., the squaring loss and the M tli-povver loss. 

Similar results can be obtained on the quality of the symbol timing estimates 
derived above. In addition to their dependence on the SNR, the quality of symbol 
timing estimates is a function of the signal pulse shape. For example, a pulse shape that 
is commonly used in practice is one that has a raised cosine spectrum (see Section 9.2). 
For such a pulse, the rms timing error (erf) as a function of SNR is illustrated in 
Figure 5.5-1, for both decision-directed and non-decision-directed estimates. Note the 
significant improvement in performance of the decision-directed estimate compared 
with the non-decision-directed estimate. Now, if the bandwidth of the pulse is varied, 
the pulse shape is changed and, hence, the rms value of the timing error also changes. For 
example, when the bandwidth of the pulse that has a raised cosine spectrum is varied, 



FIGURE 5.5-1 

Performance of baseband symbol timing estimate 
for fixed signal and loop bandwidths. [From 
Synchronization Subsystems: Analysis and Design, 
by L. Franks, 1981. Reprinted with permission of the 
author. ] 
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Excess bandwidth factor ft 
[Bandwidth = (1 + ft)/2T] 


FIGURE 5.5-2 

Performance of baseband symbol timing estimate for fixed 
SNR and fixed loop bandwidths. [From Synchronization 
Subsystems: Analysis and Design, by L. Franks, 1981. 
Reprinted with permission of the author.] 


the rms timing error varies as shown in Figure 5.5-2. Note that the error decreases as 
the bandwidth of the pulse increases. 

In conclusion, we have presented the ML method for signal parameter estimation 
and have applied it to the estimation of the carrier phase and symbol timing. We have 
also described their performance characteristics. 


■ 5.6 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

Carrier recovery and timing synchronization are two topics that have been thoroughly 
investigated over the past three decades. The Costas loop was invented in 1956 and the 
decision-directed phase estimation methods were described in Proakis et al. (1964) and 
Natali and Walbesser (1969). The work on decision-directed estimation was motivated 
by earlier work of Price ( 1962a, b). Comprehensive treatments of phase-locked loops 
first appeared in the books by Viterbi (1966) and Gardner (1979). Books that cover 
carrier phase recovery and time synchronization techniques have been written by Stiffler 
(1971), Lindsey (1972), Lindsey and Simon (1973), Meyr and Ascheid (1990), Simon 
et al. (1995), Meyr et al. (1998), and Mengali and D’ Andrea (1997). 

A number of tutorial papers have appeared in IEEE journals on the PLL and on time 
synchronization. We cite, for example, the paper by Gupta (1975), which treats both 
analog and digital implementation of PLLs, and the paper by Lindsey and Chie (1981), 
which is devoted to the analysis of digital PLLs. In addition, the tutorial paper by Franks 
(1980) describes both carrier phase and symbol synchronization methods, including 
methods based on the maximum-likelihood estimation criterion. The paper by Franks 
is contained in a special issue of the IEEE Transactions on Communications (August 
1980) devoted to synchronization. The paper by Mueller and Muller (1976) describes 
digital signal processing algorithms for extracting symbol timing and the paper by 
Bergmans (1995) evaluates the efficiency of data-aided timing recovery methods. 

Application of the maximum-likelihood criterion to parameter estimation was 
first described in the context of radar parameter estimation (range and range rate). 
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Subsequently, this optimal criterion was applied to carrier phase and symbol timing 
estimation as well as to joint parameter estimation with data symbols. Papers on these 
topics have been published by several researchers, including Falconer (1976), Mengali 
(1977), Falconer and Salz (1977), and Meyers and Franks (1980). 

The Cramer-Rao lower bound on the variance of a parameter estimate is derived 
and evaluated in a number of standard texts on detection and estimation theory, such 
as Flelstrom (1968) and Van Trees (1968). It is also described in several books on 
mathematical statistics, such as the book by Cramer (1946). 


PROBLEMS 


5.1 Prove the relation in Equation 5.1-7. 

5.2 Sketch the equivalent realization of the binary PSK receiver in Figure 5.1-1 that employs 
a matched filter instead of a correlator. 


5.3 Suppose that the loop filter (see Equation 5.2-14) for a PLL has the transfer function 


G(s) 


1 

S + y/2 


a. Determine the closed-loop transfer function H{s) and indicate if the loop is stable. 

b. Determine the damping factor and the natural frequency of the loop. 


5.4 Consider the PLL for estimating the carrier phase of a signal in which the loop filter is 
specified as 


G(s) 


K 

1 + TiJ 


a. Determine the closed-loop transfer function H(s) and its gain at / = 0. 

b. For what range of values of X\ and K is the loop stable? 


5.5 The loop filter G(s ) in a PLL is implemented by the circuit shown in Figure P5.5. Determine 
the system function G(s) and express the time constants ri and t 2 in terms of the circuit 
parameters. 

R, FIGURE P5.5 


o Wv— 

— T 0 

< R 2 

Input 

< Output 

O 

C 

1 0 


5.6 The loop filter G(s ) in a PLL is implemented with the active filter shown in Figure P5.6. 
Determine the system function G(s) and express the time constants t\ and X 2 in terms of 
the circuit parameters. 
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FIGURE P5.6 


5.7 Show that the early-late gate synchronizer illustrated in Figure 5.3-5 is a close approxi- 
mation to the timing recovery system illustrated in Figure P5.7. 



FIGURE P5.7 

5.8 Based on an ML criterion, determine a carrier phase estimation method for binary on-off 
keying modulation. 

5.9 In the transmission and reception of signals to and from moving vehicles, the transmitted 
signal frequency is shifted in direct proportion to the speed of the vehicle. The so-called 
Doppler frequency shift imparted to a signal that is received in a vehicle traveling at a 
velocity v relative to a (fixed) transmitter is given by the formula 


where X is the wavelength, and the sign depends on the direction (moving toward or moving 
away) that the vehicle is traveling relative to the transmitter. Suppose that a vehicle is 
traveling at a speed of 100 km/h relative to a base station in a mobile cellular communication 
system. The signal is a narrowband signal transmitted at a carrier frequency of 1 GHz. 

a. Determine the Doppler frequency shift. 

b. What should be the bandwidth of a Doppler frequency tracking loop if the loop is de- 
signed to track Doppler frequency shifts for vehicles traveling at speeds up to 1 00 km/h? 

c. Suppose the transmitted signal bandwidth is 2 MHz centered at 1 GHz. Determine the 
Doppler frequency spread between the upper and lower frequencies in the signal. 

5.10 Show that the mean value of the ML estimate in Equation 5.2-38 is cf>, i.e., that the estimate 
is unbiased. 

5.11 Determine the PDF of the ML phase estimate in Equation 5.2-38. 

5.12 Determine the ML phase estimate for offset QPSK. 


Chapter Five: Carrier and Symbol Synchronization 


329 


5.13 A single-sideband PAM signal may be represented as 

u m (t) = A m [gr (t ) cos 2nf c t - g T (t) sm2nf c t] 

where g T (t ) is the Hilbert transform of gr(t) and A m is the amplitude level that conveys the 
information. Demonstrate mathematically that a Costas loop cannot be used to demodulate 
the SSB PAM signal. 

5.14 A carrier component is transmitted on the quadrature carrier in a communication system 
that transmits information via binary PSK. Hence, the received signal has the form 


where 0 is the carrier phase and n(t ) is AWGN. The unmodulated carrier component is 
used as a pilot signal at the receiver to estimate the carrier phase. 

a. Sketch a block diagram of the receiver, including the carrier phase estimator. 

b. Illustrate mathematically the operations involved in the estimation of the carrier phase 0. 

c. Express the probability of error for the detection of the binary PSK signal as a function 
of the total transmitted power P T = P s + P c - What is the loss in performance due to 
the allocation of a portion of the transmitted power to the pilot signal? Evaluate the loss 
for P C /P T = 0.1. 

5.15 Determine the signal and noise components at the input to a fourth-power ( M = 4) PLL 
that is used to generate the carrier phase for demodulation of QPSK. By ignoring all noise 
components except those that are linear in the noise n(t), determine the variance of the 
phase estimate at the output of the PLL. 

5.16 The probability of error for binary PSK demodulation and detection when there is a carrier 
phase error <f> e is 


Suppose that the phase error from the PLL is modeled as a zero-mean Gaussian random 
variable with variance cr| -C n . Determine the expression for the average probability of 
error (in integral form). 

5.17 Determine the ML estimate of the time delay r for the QAM signal of the form 


r 


(t ) = ± \flP s cos(2jr/ c t + 0) + \JlP c sin(2jr/ c f + 0) + n(t) 



s(t ) = R e[si(t\ T)e j2ltfct ] 


where 


s/(f; r ) = ^ ^git -nT - r) 


n 


and {/„} is a sequence of complex- valued data. 


5.18 Determine the joint ML estimate of r and 0 for a PAM signal. 


5.19 Determine the joint ML estimate of r and 0 for offset QPSK. 



An Introduction to Information Theory 


This chapter deals with fundamental limits on communications. By fundamental 
limits we mean the study of conditions under which the two fundamental tasks in 
communications — compression and transmission — are possible. In this chapter we will 
see that for some important source and channel models, we can precisely state the limits 
for compression and transmission of information. 

In Chapter 4, we considered the optimal detection of digitally modulated signals 
when transmitted through an AWGN channel. We observed that some modulation meth- 
ods provide better performance than others. In particular, we observed that orthogonal 
signaling waveforms allow us to make the probability of error arbitrarily small by let- 
ting the number of waveforms M — »• oc, provided that the SNR per bit yt, > —1.6 dB. 
However, if % falls below —1.6 dB, then reliable communication is impossible. The 
value of — 1 .6 dB is an example of a fundamental limit for communication systems. 

We begin this chapter with a study of information sources and source coding. 
Communication systems are designed to transmit the information generated by a source 
to some destination. Information sources may take a variety of different forms. For 
example, in radio broadcasting, the source is generally an audio source (voice or music). 
In TV broadcasting, the information source is a video source whose output is a moving 
image. The outputs of these sources are analog signals and, hence, the sources are 
called analog sources. In contrast, computers and storage devices, such as magnetic or 
optical disks, produce discrete outputs (usually binary or ASCII characters), and hence 
are called discrete sources. 

Whether a source is analog or discrete, a digital communication system is designed 
to transmit information in digital form. Consequently, the output of the source must be 
converted to a format that can be transmitted digitally. This conversion of the source 
output to a digital form is generally performed by the source encoder, whose output 
may be assumed to be a sequence of binary digits. 

In the second half of this chapter we focus on communication channels and trans- 
mission of information. We develop mathematical models for important channels and 
introduce two important parameters for communication channels — channel capacity 
and channel cutoff rate — and elaborate on their meaning and significance. 
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Later in Chapters 7 and 8, we consider signal waveforms generated from either 
binary or nonbinary sequences. We shall observe that, in general, coded waveforms 
offer performance advantages not only in power-limited applications where R/W < 1 , 
but also in bandwidth-limited systems where R/W > 1 . 


■ 6.1 

MATHEMATICAL MODELS FOR INFORMATION SOURCES 


Any information source produces an output that is random; i.e., the source output is 
characterized in statistical terms. Otherwise, if the source output were known exactly, 
there would be no need to transmit it. In this section, we consider both discrete and ana- 
log information sources, and we postulate mathematical models for each type of source. 

The simplest type of a discrete source is one that emits a sequence of letters selected 
from a finite alphabet. For example, a binary source emits a binary sequence of the form 
100101 1 10 ■ • ■ , where the alphabet consists of the two letters {0, 1}. More generally, a 
discrete information source with an alphabet of L possible letters, say [x \ , jc 2 , ■ ■ . , x\\, 
emits a sequence of letters selected from the alphabet. 

To construct a mathematical model for a discrete source, we assume that each letter 
in the alphabet {x\, X 2 , . . . , xl } has a given probability pk of occurrence. That is, 

p k = P[X = Xk \ , 1 <k<L 

where 

Ew = 1 

k= 1 

We consider two mathematical models of discrete sources. In the first, we assume 
that the output sequence from the source is statistically independent. That is, the current 
output letter is statistically independent of all past and future outputs. A source whose 
output satisfies the condition of statistical independence among output letters is said 
to be memoryless. If the source is discrete, it is called a discrete memoryless source 
(DMS). The mathematical model for a DMS is a sequence of iid random variables {X, }. 

If the output of the discrete source is statistically dependent, such as English text, 
we may construct a mathematical model based on statistical stationarity. By definition, 
a discrete source is said to be stationary if the joint probabilities of two sequences of 
length n, say, a\, a 2 , . . . , a n and a\ +m , a 2 + m » ■ • • , a n+m , are identical for all n > 1 and 
for all shifts m. In other words, the joint probabilities for any arbitrary length sequence 
of source outputs are invariant under a shift in the time origin. 

An analog source has an output waveform x(t) that is a sample function of a 
stochastic process X(t). We assume that X(t) is a stationary stochastic process with 
autocorrelation function R x ( r) and power spectral density S x (f)- When X(t) is a 
band-limited stochastic process, i.e., S x {f) = 0 for |/| > W . the sampling theorem 
may be used to represent X(t) as 


X(t) = y X 


( n \ 

( n Y 

sine 

2 Wit 

\2WJ 

L V 2WJ\ 


( 6 . 1 - 1 ) 


n =— oo 
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where {X{n/2W)} denote the samples of the process X(t) taken at the sampling 
(Nyquist) rate of f = 211/ samples/s. Thus, by applying the sampling theorem, we 
may convert the output of an analog source to an equivalent discrete-time source. Then 
the source output is characterized statistically by the joint PDF p(x i, * 2 , . . . , x m ) for 
all 771 > 1 , where X„ = X(n/2W), 1 < n < m, are the random variables corresponding 
to the samples of X(t). 

We note that the output samples [X{n/2W)} from the stationary sources are gen- 
erally continuous, and hence they cannot be represented in digital form without some 
loss in precision. For example, we may quantize each sample to a set of discrete values, 
but the quantization process results in loss of precision, and consequently the original 
signal cannot be reconstructed exactly from the quantized sample values. Later in this 
chapter, we shall consider the distortion resulting from quantization of the samples 
from an analog source. 


■ 6.2 

A LOGARITHMIC MEASURE OF INFORMATION 


To develop an appropriate measure of information, let us consider two discrete random 
variables X and Y with possible outcomes in the alphabets W and 3/, respectively. 
Suppose we observe some outcome Y = y and we wish to determine, quantitatively, 
the amount of information that the occurrence of the event Y = y provides about 
the event X = x. We observe that when X and Y are statistically independent, the 
occurrence of Y = y provides no information about the occurrence of the event X = x. 
On the other hand, when X and Y are fully dependent such that the occurrence of 
Y = y determines the occurrence of A = x, then the information content is simply that 
provided by the event X = x. A suitable measure that agrees with the intuitive notion 
of information is the logarithm of the ratio of the conditional probability 

P[X = x \Y = y]±P\x I 37 ] 

divided by the probability 

P[A = *]AP[jc] 


That is, the information content provided by the occurrence of the event Y = y about 
the event X = x is defined as 


I(x; y) = log 


P[x\y\ 

PM 


( 6 . 2 - 1 ) 


/ (x ; 37) is called the mutual information between x and y. The mutual information 
between random variables X and Y is defined as the average of l (x; y) and is given by 


I(X ; Y) = Y, E p \ x = x ’ Y = v] I(x; y) 

X€2Fy€&f 


= ££ p [* 


xeTyess 


= X, Y = y] log 


P[*l v] 

P[x\ 


( 6 . 2 - 2 ) 
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The units of 7(X; Y) are determined by the base of the logarithm, which is usually 
selected as either 2 or e. When the base of the logarithm is 2, the units of 7(X; Y) are 
bits', and when the base is e, the units of /(X; Y) are called nuts (natural units). (The 
standard abbreviation for log e is In.) Since 

In a = ln21og 2 a = 0.69315 log 2 a 

the information measured in nats is equal to In 2 times the information measured in 
bits. 

Some of the most important properties of the mutual information are given below. 
Some of these properties are proved in problems at the end of this chapter. 

1. I(X; Y) = I(Y; X) 

2. /(X; Y) > 0, with equality if and only if X and Y are independent 

3. /(X; Y) < min{| BP |, \<3/ \\ where | BP | and 1 3/ | denote the size of the alphabets 

When the random variables X and Y are statistically independent, P [x | y ] = P [x] 
and hence 7(X; Y ) = 0. On the other hand, when the occurrence of the event Y = y 
uniquely determines the occurrence of the event X = x, the conditional probability in 
the numerator of Equation 6.2-1 is unity, hence 

I(x; y ) = log 1 = - logP [X = x] (6.2-3) 

I L xv Xj 

and 


7(X;T) = -^^P[X = x,T = y]logP[X = .r] 
= - p [x = •*] log P [X = x ] 


(6.2-4) 


The value of 7(X; Y) under this condition, which is denoted 77 (X) and is defined by 
77(X) = — ^ P [X = x] log P [X = x] (6.2-5) 

is called the entropy of the random variable X and is a measure of uncertainty or 
ambiguity in X. Since knowledge of X completely removes uncertainty about it, 77 (X) 
is also a measure of information that is acquired by knowledge of X, or the information 
content of X per source output. The unit for entropy is bits (or nats) per symbol, or per 
source output. Note that in the definition of entropy, we define 0 log 0 = 0. It is also 
important to note that both entropy and mutual information depend on the probabilities 
of the random variables and not on the values the random variables take. 

If an information source is deterministic, i.e., for one value of X the probability 
is equal to 1 and for all other values of X the probability is equal to 0, the entropy of 
the source is equal to zero, i.e., there is no ambiguity in tins source, and the source 
does not convey any information. In Problem 6.3 we show that for a DMS source with 
alphabet size \BP \ , the entropy is maximized when all outputs are equiprobable. In this 
case 77(X) = log|:T |. 
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FIGURE 6.2-1 

The binary entropy function. 


n b (p) 



Probability p 


The most important properties of the entropy functions are as follows: 

1. o < H(X) < log i an 

2. I{X\ X) = H(X) 

3. I(X; Y) < min{//(X), H(Y)} 

4. If Y = g(X), then H(Y) < H(X) 

example 6 . 2 - 1 . For a binary source with probabilities p and 1 — p we have 

H(X) = -p log p - (1 - p) log(l - p) (6.2-6) 

This function is called the binary entropy function and is denoted by Hb{p). A plot of 
Hb{p) is shown in Figure 6.2-1. 


Joint and Conditional Entropy 

The entropy of a pair of random variables ( X , T), called the joint entropy of X and Y , 
is defined as an extension of the entropy of a single random variable as 

H(X, Y) = — Y[X = x,Y = y] logP [X = x,Y = y] (6.2-7) 

(x,y)e$rx.&~ 

When the value of random variable X is known to be x, the PMF of Y becomes 
P [Y = y \ X = x ] and the entropy of Y under this condition becomes 

H(Y\X = x) = -^ p [ y = y \ x = *]logP[T = y\X = x] (6.2-8) 

The average of this quantity over all possible values of X is denoted by H(Y\X) and is 
called the conditional entropy ofY given X. 

H(Y\X ) = ^P[X = x] H(Y\X = x) 

= - J2 Y[X = x,Y = >’]logP[T = y\X = x] 

(x,y)eg’x& r 


(6.2-9) 
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From Equations 6.2-7 and 6.2-9 it is easy to verify that 

H(X,Y)= H(X) + H(Y\X) (6.2-10) 

Some of the important properties of joint and conditional entropy are summarized 
below. 

1. 0 < H(X\Y) < H(X), with H(X\Y) = H(X) if and only if X and Y are indepen- 
dent. 

2. H(X, Y) = H(X) + tf(y|X) = H(Y) + H(X\Y) < H(X) + H(Y), with equality 
H(X, Y) = H(X) + H( Y) if and only if X and Y are independent. 

3. I(X; Y) = H(X) - H(X\Y) = H(Y) - H(Y\X) = H(X) + H(Y) - H(X, Y). 

The notion of joint and conditional entropy can be extended to multiple random 
variables. For joint entropy we have 

H(X u X 2 ,...,X n ) = - £ P[X l =x u X 2 =x 2 ,...,X n =x n ] 

X x ,X2,...,X„ (6.2-1 1) 

x logP [X] = x u X 2 =x 2 ,...,X„= x„_i] 

The following relation between joint and conditional entropies is known as the chain 
rule for entropies. 


H(X !, X 2 , . ■ • , X„) = H[X<) + H(X 2 \Xi) + H(X 3 \X U X 2 ) 
+ ... + H(X n \X u X 2 ,...,X n ^) 


( 6 . 2 - 12 ) 


Using the above relation and the first property of the conditional entropy, we have 

n 

H(X ! , X 2 , . . . , X n ) < H{Xi) (6.2-13) 

i=l 

with equality if Xfs are statistically independent. If Xf s are iid, we clearly have 

H(X u X 2 ,...,X n ) = nH(X) (6.2-14) 


where H(X) denotes the common value of the entropy of Xf s. 


■ 6.3 

LOSSLESS CODING OF INFORMATION SOURCES 

The goal of data compression is to represent a source with the fewest bits such that best 
recovery of the source from the compressed data is possible. Data compression can be 
broadly classified into lossless and lossy compression. In lossless compression the goal 
is to minimize the number of bits in such a way that perfect (lossless) reconstruction 
of the source from compressed data is possible. In lossy data compression the data 
are compressed subject to a maximum tolerable distortion. In this section we study 
the fundamental bounds for lossless compression as well as some common lossless 
compression algorithms. 
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6.3-1 The Lossless Source Coding Theorem 


Let us assume that a DMS is represented by independent replicas of random variable 
X taking values in the set BP = {a\, « 2 » • • • , fl v 1 with corresponding probabilities 
p i, p 2 , . . . , Pn- Let x denote an output sequence of length n for this source, where 
n is assumed to be large. We call this sequence a typical sequence if the number of 
occurrences of each a, in jc is roughly n p, for 1 < i < N. The set of typical sequences 
is denoted by A. 

The law of large numbers, reviewed in Section 2.5, states that with high probability 
approaching 1 as n — > oo, outputs of any DMS will be typical. Since the number of 
occurrences of a, in x is roughly /;/;,■ and the source is memory less, we have 

N 

iogp[z=x]«io g n^ 

i=i 

(6.3-1) 

= 2^ npi log pi 

i = 1 

= —nH{X) 

Hence, 

P [X = x] % 2~ nH(X) (6.3-2) 

This states that all typical sequences have roughly the same probability, and this common 
probability is 2~ nH(X \ 

Since the probability of the typical sequences, for large n, is very close to 1, we 
conclude that the number of typical sequences, i.e., the cardinality of A, is roughly 

\A\ « 2 nH<X) (6.3-3) 


This discussion shows that for large n, a subset of all possible sequences, called 
the typical sequences, is almost certain to occur. Therefore, for transmission of source 
outputs it is sufficient to consider only this subset. Since the number of typical sequences 
is 2" u(x >. for their transmission nH(X) bits are sufficient, and therefore the number of 
required bits per source output, i.e., the transmission rate, is given by 


nH(X) 

n 


= H(X) 


bits per transmission 


(6.3-4) 


The informal argument given above can be made rigorous (see the books by Cover 
and Thomas (2006) and Gallager (1968)) in the following theorem first stated by 
Shannon (1948). 


SHANNON’S FIRST THEOREM (LOSSLESS SOURCE CODING THEOREM) Let X denote a 
DMS with entropy X. There exists a lossless source code for this source at any rate R 
if R > H(X). There exists no lossless code for this source at rates less than H(X). 

This theorem sets a fundamental limit on lossless source coding and shows that the 
entropy of a DMS, which was defined previously based on intuitive reasoning, plays a 
fundamental role in lossless compression of information sources. 
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Discrete Stationary Sources 

We have seen that the entropy of a DMS sets a fundamental limit on the rate at which the 
source can be losslessly compressed. In this section, we consider discrete sources for 
which the sequence of output letters is statistically dependent. We limit our treatment 
to sources that are statistically stationary. 

Let us evaluate the entropy of any sequence of letters from a stationary source. 
From the chain rule for the entropies stated in Equation 6.2-12, the entropy of a block 
of random variables X\X 2 - ■ ■ X k \s 

k 

H(X iX 2 ■■■X k ) = J2 H (Xi \XiX 2 ■ ■ ■ X,_0 (6.3-5) 

1=1 

where H(Xj \X x X 2 ■ ■ ■ W-i) is the conditional entropy of the / th symbol from the 
source, given the previous i — 1 symbols. The entropy per letter for the k-symbol block 
is defined as 

H k {X)= l -H{X x X 2 ---X k ) (6.3-6) 

k 

We define the entropy rate of a stationary source as the entropy per letter in Equa- 
tion 6.3-6 in the limit as k — > 00 . That is, 

H^iX) = lim H k (X) = lim \h{X x X 2 ■ • • X k ) (6.3-7) 

k — >00 >00 k 

The existence of this limit is established below. 

As an alternative, we may define the entropy rate of the source in terms of the con- 
ditional entropy H(X k \X x X 2 - ■ ■ X k -\) in the limit as k approaches infinity. Fortunately, 
this limit also exists and is identical to the limit in Equation 6.3-1. That is, 

UooiX) = lim H(X k \XiX 2 ■ ■ ■ X k _ { ) (6.3-8) 

k—>o o 

This result is also established below. Our development follows the approach in Gallager 
(1968). 

First, we show that 

H(X k \XiX 2 ■ ■ ■ X k _ x ) < H(X k _ x \XiX 2 ■ ■ ■ X k _ 2 ) (6.3-9) 

for k > 2. From our previous result that conditioning on a random variable cannot 
increase entropy, we have 

H(X k \XyX 2 ■ ■ ■ X k .i) < H(X k \X 2 X 3 ■ ■ ■ Xjt-O (6.3-10) 

From the stationarity of the source, we have 

H(X k \X 2 X 3 ■■■X k _ l )= ff(I w \XiX 2 ■ ■ ■ X k _ 2 ) (6.3-1 1) 

Hence, Equation 6.3-9 follows immediately. This result demonstrates that 
H(X k \X x X 2 ■ ■ ■ X k _ 1 ) is a nonincreasing sequence in k. 

Second, we have the result 


H k (X) > H(X k \X x X 2 ---X k ^) 


(6.3-12) 
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which follows immediately from Equations 6.3-5 and 6.3-6 and the fact that the last 
term in the sum of Equation 6.3-5 is a lower bound on each of the other k — 1 terms. 
Third, from the definition of H k (X). we may write 

H k (X) = \[H{X y X 2 ■ ■ ■ X k -d + H(X k \Xj. ■ ■ ■ X*_i)] 
k 

= 7 [(* — l)H*_t(X) + H(X k \X x ■ ■ • X w )] (6.3-13) 

k 

k- 1 1 

< — H k _ x {X)+-H k {X) 


which reduces to 


H k (X) < H k -i(X) 


(6.3-14) 


Hence, H k (X) is a nonincreasing sequence in k. 

Since H k (X ) and the conditional entropy H(X k \X \ ■ ■ ■ X k . \ ) are both nonnegative 
and nonincreasing with k, both limits must exist. Their limiting forms can be established 
by using Equations 6.3-5 and 6.3-6 to express H k+ j(X) as 

H k+j (X) = — ^ H{X x X 2 ■ ■ ■ X k _ x ) 
k + j 

+ T -[H(X k \X 1 ---X k . l )+H(X k+1 \X l ---X k ) (6-3-15) 

k + j 

+ • • • + H(X k+ j\X x ■ ■ ■ X k+ j_i)\ 

Since the conditional entropy is nonincreasing, the hrst term in the square brackets 
serves as an upper bound on the other terms. Hence, 

H k+J (X) < — H(X x X 2 ■ ■ ■ X k _ x ) + H(X k \XiX 2 ■ ■ ■ ^t-i) (6.3-16) 

k+j k+j 

For a fixed k. the limit of Equation 6.3-16 as j — >• oo yields 


Ho a (X) < H(X k \XiX 2 • • • X k -fi (6.3-17) 


But Equation 6.3-17 is valid for all k: hence, it is valid for k oc. Therefore, 

Hoo(X) < lim H(X k \XiX 2 ■ ■ ■ X k -0 (6.3-18) 

k—>oo 


On the other hand, from Equation 6.3-12, we obtain in the limit as k — »• oo 

Hoo(X) > lim H{X k \XiX 2 ■ ■ ■ X k -i) (6.3-19) 

k—>o o 


which establishes Equation 6.3-8. 
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From the discussion above the entropy rate of a discrete stationary source is de- 
fined as 

ffoo(X) = lim H(X k \X U X 2 , . . . , X k -i) = lim } H(X U X 2 , . . . , X k ) (6.3-20) 

k^-oo k—>oo It 

It is clear from above that if the source is memoryless, the entropy rate is equal to the 
entropy of the source. 

For discrete stationary sources, the entropy rate is the fundamental rate for compres- 
sion of the source such that lossless recovery is possible. Therefore, a lossless coding 
theorem for discrete stationary sources, similar to the one for discrete memoryless 
sources, exists that states lossless compression of the source at rates above the entropy 
rate is possible, but lossless compression at rates below the entropy rate is impossible. 


6.3-2 Lossless Coding Algorithms 

In this section we study two main approaches for lossless compression of discrete 
information sources — the Huffman coding algorithm and the Lempel-Ziv algorithm. 
The Huffman coding algorithm is an example of a variable-length coding algorithm, 
and the Lempel-Ziv algorithm is a fixed-length coding algorithm. 

Variable-Length Source Coding 

When the source symbols are not equally probable, an efficient encoding method is 
to use variable-length code words. An example of such encoding is the Morse code, 
which dates back to the nineteenth century. In the Morse code, the letters that occur more 
frequently are assigned short code words, and those that occur infrequently are assigned 
long code words. Following this general philosophy, we may use the probabilities of 
occurrence of the different source letters in the selection of the code words. The problem 
is to devise a method for selecting and assigning the code words to source letters. This 
type of encoding is called entropy coding. 

For example, suppose that a DMS with output letters a \ , a 2 , a 2 , 04 and correspond- 
ing probabilities P(a\) = P(a 2 ) = and P{a 2 ) = P{a 4 ) = | is encoded as 

shown in Table 6.3-1 . Code I is a variable-length code that has a basic flaw. To see the 
flaw, suppose we are presented with the sequence 001001 . Clearly, the first symbol 

corresponding to 00 is a 2 . However, the next 4 bits are ambiguous (not uniquely decod- 
able). They may be decoded either as a 4 a 2 or as a\a 2 a\ . Perhaps, the ambiguity can be 


TABLE 6.3-1 

Variable-Length Codes. 


Letter 

Plfltl 

Code I 

Code II 

Code III 

a\ 

1 

2 

1 

0 

0 

02 

1 

4 

00 

10 

01 

a 3 

1 

8 

01 

110 

Oil 

a 4 

1 

8 

10 

111 

111 
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FIGURE 6.3-1 

Code tree for code II in Table 6.3-1. 


resolved by waiting for additional bits, but such a decoding delay is highly undesir- 
able. We shall consider only codes that are decodable instantaneously, i.e., without any 
decoding delay. Such codes are called instantaneous codes. 

Code II in Table 6.3-1 is uniquely decodable and instantaneous. It is convenient to 
represent the code words in this code graphically as terminal nodes of a tree, as shown 
in Figure 6.3-1 . We observe that the digit 0 indicates the end of a code word for the first 
three code words. This characteristic plus the fact that no code word is longer than three 
binary digits makes this code instantaneously decodable. Note that no code word in this 
code is a prefix of any other code word. In general, the prefix condition requires that 
for a given code word c* of length k having elements (b\ .bo, ... . bfi), there is no other 
code word of length l < k with elements (Z?! , ..-,£>/) for 1 < l < k — 1. In other 

words, there is no code word of length 1 < k that is identical to the first / binary digits 
of another code word of length k > I. This property makes the code words uniquely 
and instantaneously decodable. 

Code III given in Table 6.3-1 has the tree structures shown in Figure 6.3-2. We 
note that in this case the code is uniquely decodable but not instantaneously decodable. 
Clearly, this code does not satisfy the prefix condition. 

Our main objective is to devise a systematic procedure for constructing uniquely 
decodable variable-length codes that are efficient in the sense that the average number 
of bits per source letter, defined as the quantity 


is minimized. The conditions for the existence of a code that satisfies the prefix condition 
are given by the Kraft inequality. 

The Kraft Inequality 

The Kraft inequality states that a necessary and sufficient condition for the existence 
of a binary code with code words having lengths n \ < ti 2 < • • • < «/. that satisfy the 
prefix condition is 


L 



(6.3-21) 


L 



(6.3-22) 


k= 1 



FIGURE 6.3-2 

Code tree for code III in Table 6.3-1. 


1 


l 


l 
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First, we prove that Equation 6.3-22 is a sufficient condition for the existence of 
a code that satisfies the prefix condition. To construct such a code, we begin with a 
full binary tree of order n = n L that has 2" terminal nodes and two nodes of order k 
stemming from each node of order k — 1, for each k, 1 < k < n. Let us select any 
node of order m as the first code word c\. This choice eliminates 2" - " 1 terminal nodes 
(or the fraction 2 - " 1 of the 2" terminal nodes). From the remaining available nodes of 
order «2, we select one node for the second code word c 2. This choice eliminates 2" - " 2 
terminal nodes (or the fraction 2 - " 2 of the 2" terminal nodes). This process continues 
until the last code word is assigned at terminal node n = n L . Since, at the node of order 
j < L, the fraction of the number of terminal nodes eliminated is 


2 ~' 11 - 1 


k= 1 


k= 1 


(6.3-23) 


there is always a node of order k > j available to be assigned to the next code word. 
Thus, we have constructed a code tree that is embedded in the full tree of 2" nodes 

as illustrated in Figure 6.3-3, for a tree having 16 terminal nodes and a source output 

consisting of five letters with n \ = 1, re 2 = 2, re 3 = 3, and re 4 = ns = 4. 

To prove that Equation 6.3-22 is a necessary condition, we observe that in the code 
tree of order n = n L , the number of terminal nodes eliminated from the total number 
of 2" terminal nodes is 

L 

2"“" i < 2" (6.3-24) 

k=\ 

Hence, 

L 

2 “"‘ < 1 (6.3-25) 

k= 1 

and the proof of Kraft inequality is complete. 

The Kraft inequality may be used to prove the following version of the lossless 
source coding theorem, which applies to codes that satisfy the prefix condition. 



FIGURE 6.3-3 

Construction of binary tree code embedded in a full tree. 
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SOURCE CODING theorem for prefix CODES Let X be a DMS with finite entropy 
H(X) and output letters a, , 1 < i < IV, with corresponding probabilities of occurrence 
Pi, 1 < i < N. It is possible to construct a code that satisfies the prefix condition and 
has an average length R that satisfies the inequalities 

H(X) < R < H(X) + 1 (6.3-26) 


To establish the lower bound in Equation 6.3-26, we note that for code words that have 
length rii, 1 < i < N, the difference H(X) — R may be expressed as 


H{X) - R = y^ Pi log 2 V pirn 

^ Pi 


i = 1 
N 


= E^ l0 S2 — 

rr Pi 


Use of the inequality In x < x — 1 in Equation 6.3-27 yields 


H(X) - R < (log 2 e) E Pi ( 1 


1 = 1 


2 ~ n ‘ 


<(log 2 e) E 2 '”'- 1 


i=i 


(6.3-27) 


(6.3-28) 


where the last inequality follows from the Kraft inequality. Equality holds if and only 
if pi = 2~ n ‘ for 1 < i < N. 

The upper bound in Equation 6.3-26 may be established under the constraint that 
Hi, 1 < i < N, are integers, by selecting the {«, ) such that 2~ n ‘ < p, < 2~" i+l . But if 
the terms p , > 2 - "' are summed over 1 < i < /V, we obtain the Kraft inequality, for 
which we have demonstrated that there exists a code that satisfies the prefix condition. 
On the other hand, if we take the logarithm of p , < 2 - " i+1 , we obtain 


log pi < -Hi + 1 


or, equivalently, 


n; < 1 - log pi 


(6.3—29) 

(6.3-30) 


If we multiply both sides of Equation 6.3-30 by p, and sum over 1 < i < N, we 
obtain the desired upper bound given in Equation 6.3-26. This completes the proof of 
Equation 6.3-26. 

We have now established that variable-length codes that satisfy the prefix condition 
are efficient source codes for any DMS with source symbols that are not equally 
probable. Let us now describe an algorithm for constructing such codes. 


The Huffman Coding Algorithm 

Huffman (1952) devised a variable-length encoding algorithm, based on the source 
letter probabilities P(x,), i = 1 , 2, .... L. This algorithm is optimum in the sense 
that the average number of binary digits required to represent the source symbols is a 
minimum, subject to the constraint that the code words satisfy the prefix condition, as 
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defined above, which allows the received sequence to be uniquely and instantaneously 
decodable. We illustrate this encoding algorithm by means of two examples. 

example 6 .3-1. Consider a DMS with seven possible symbols x\, X2, . . . , xj having 
the probabilities of occurrence illustrated in Figure 6.3-4. We have ordered the source 
symbols in decreasing order of the probabilities, i.e., P(x 1 ) > /fix 2) > ■ ■ • > P(x 7). 
We begin the encoding process with the two least probable symbols X(, and X7 . These two 
symbols are tied together as shown in Figure 6.3-4, with the upper branch assigned 
a 0 and the lower branch assigned a 1. The probabilities of these two branches are 
added together at the node where the two branches meet to yield the probability 0.01. 
Now we have the source symbols x\, . . . , X5 plus a new symbol, say x' 6 , obtained by 
combining X(, and xj. The next step is to join the two least probable symbols from 
the set x\, X2, X3, X4, X5, x' 6 . These are X5 and x' ( , which have a combined probability 
of 0.05. The branch from X5 is assigned a 0 and the branch from x' 6 is assigned a 1. 
This procedure continues until we exhaust the set of possible source letters. The result 
is a code tree with branches that contain the desired code words. The code words are 
obtained by beginning at the rightmost node in the tree and proceeding to the left. The 
resulting code words are listed in Figure 6.3-4. The average number of binary digits 
per symbol for this code is R = 2.21 bits per symbol. The entropy of the source is 
2.1 1 bits per symbol. 

We make the observation that the code is not necessarily unique. For example, at 
the next to the last step in the encoding procedure, we have a tie between xi and xj, 
since these symbols are equally probable. At this point, we chose to pair x\ with X2. An 
alternative is to pair X2 with xj . If we choose this pairing, the resulting code is illustrated 
in Figure 6.3-5. The average number of bits per source symbol for this code is also 
2.21. Hence, the resulting codes are equally efficient. Secondly, the assignment of a 0 
to the upper branch and a 1 to the lower (less probable) branch is arbitrary. We may 



FIGURE 6.3-4 

An example of variable-length source 
encoding for a DMS. 


Letter 

Probability 

Self-information 

Code 

X, 

0.35 

1.5146 

00 

X2 

0.30 

1.7370 

01 

X 3 

0.20 

2.3219 

10 

X4 

0.10 

3.3219 

110 

X 5 

0.04 

4.6439 

1110 

X6 

0.005 

7.6439 

11110 

Xi 

0.005 

7.6439 

11111 


H(X) = 2.11 


R = 2.21 
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o FIGURE 6.3-5 


0.30 

0.20 

0.10 

0.04 

0.005 

0.005 


An alternative code for the DMS in Example 





0.65 


0 

0.35 


0 

0.15 

1 



1 

0 

0.05 




0.01 

1 



l 


Letter 

Code 

Xl 

0 

*2 

10 

X 3 

110 

X4 

1110 

X5 

11110 

*6 

111110 

Xl 

linn 


R = 2.21 


simply reverse the assignment of a 0 and 1 and still obtain an efficient code satisfying 
the prefix condition. 

example 6.3-2. As a second example, let us determine the Huffman code for the 
output of a DMS illustrated in Figure 6.3-6. The entropy of this source is H(X) = 
2.63 bits per symbol. The Huffman code as illustrated in Figure 6.3-6 has an average 
length of j R = 2.70 bits per symbol. Hence, its efficiency is 0.97. 


FIGURE 6.3-6 

o Huffman code for Example 6.3-2. 


l 


0.02 1 


Letter 

Code 

Xl 

00 

X2 

010 

X3 

Oil 

X4 

100 

Xs 

101 

*6 

110 

Xl 

1110 

Xs 

mi 


0 . 36 - 

0 . 14 - 

0 . 13 - 

0 . 12 - 

o.io- 

0 . 09 - 


0.27 


0.63 


0.22 


0.15 


0.37 


H(X) = 2.63 


R = 2.70 
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TABLE 6.3-2 

Huffman code for Example 6.3-3 


Letter 

Probability 

Self-information 

Code 

Xl 

0.45 

1.156 

1 

x 2 

0.35 

1.520 

00 

X 3 

0.20 

2.330 

01 


H(X) = 

; 1.513 bits/letter 



Ri = 

1.55 bits/letter 



Efficiency = 97.6% 



The variable-length encoding (Huffman) algorithm described in the above exam- 
ples generates a prefix code having an R that satisfies Equation 6.3-26. However, in- 
stead of encoding on a symbol-by-symbol basis, a more efficient procedure is to encode 
blocks of J symbols at a time. In such a case, the bounds in Equation 6.3-26 become 

JH(X) <Rj< JH{X) + 1, (6.3-31) 

since the entropy of a ./-symbol block from a DMS is J H(X), and Rj is the average 
number of bits per J -symbol blocks. If we divide Equation 6.3-31 by J, we obtain 

H(X) <y k H{X) + 7 (6.3-32) 

where Rj/J = R is the average number of bits per source symbol. Hence R can be 
made as close to H(X) as desired by selecting J sufficiently large. 

example 6.3-3. The output of a DMS consists of letters xi , X 2 , and X 3 with probabili- 
ties 0.45, 0.35, and 0.20, respectively. The entropy of this source is H(X ) = 1.513 bits 
per symbol. The Huffman code for this source, given in Table 6.3-2, requires R\ = 1 .55 
bits per symbol and results in an efficiency of 97.6 percent. If pairs of symbols are en- 
coded by means of the Huffman algorithm, the resulting code is as given in Table 6.3-3. 
The entropy of the source output for pairs of letters is 2 H(X) = 3.026 bits per symbol 

■ TABLE 6.3-3 

Huffman code for encoding pairs of letters 


Letter pair 

Probability 

Self-information 

Code 

X\X\ 

0.2025 

2.312 

10 

X\X 2 

0.1575 

2.676 

001 

X2X1 

0.1575 

2.676 

010 

X2X2 

0.1225 

3.039 

Oil 

x,x 3 

0.09 

3.486 

111 

X3X1 

0.09 

3.486 

0000 

X2X3 

0.07 

3.850 

0001 

X3X2 

0.07 

3.850 

1100 

X3X3 

0.04 4.660 

2 H(X) = 3.026 bits/letter pair 
R 2 = 3.0675 bits/letter pair 
\R 2 = 1.534 bits/letter 
Efficiency = 98.6% 

1101 
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pair. On the other hand, the Huffman code requires AS = 3.0675 bits per symbol pair. 

Thus, the efficiency of the encoding increases to 2 H(X)/R 2 = 0.986 or, equivalently, 

to 98.6 percent. 

In summary, we have demonstrated that efficient encoding for a DMS may be done 
on a symbol-by-symbol basis using a variable-length code based on the Huffman algo- 
rithm. Furthermore, the efficiency of the encoding procedure is increased by encoding 
blocks of 7 symbols at a time. Thus, the output of a DMS with entropy H{X) may be 
encoded by a variable-length code with an average number of bits per source letter that 
approaches H(X ) as closely as desired. 

The Huffman coding algorithm can be applied to discrete stationary sources as 
well as discrete memoryless sources. Suppose we have a discrete stationary source 
that emits 7 letters with Hj(X ) as the entropy per letter. We can encode the sequence 
of 7 letters with a variable-length Huffman code that satisfies the prefix condition by 
following the procedure described above. The resulting code has an average number of 
bits for the 7 -letter block that satisfies the condition 

H(X i Rj < H(X ! • • • Xj) + 1 (6.3-33) 

By dividing each term of Equation 6.3-33 by 7, we obtain the bounds on the average 
number R = Rj/J of bits per source letter as 

Hj(X) <R< Hj(X ) + j (6.3-34) 

By increasing the block size 7, we can approach Hj(X ) arbitrarily closely, and in the 
limit as 7 — »• oo, R satisfies 

Hoc(X) <R< H^X) + e (6.3-35) 

where e approaches zero as 1/7. Thus, efficient encoding of stationary sources is 
accomplished by encoding large blocks of symbols into code words. We should em- 
phasize, however, that the design of the Huffman code requires knowledge of the joint 
PDF for the 7-symbol blocks. 

The Lempel-Ziv Algorithm 

From our preceding discussion, we have observed that the Huffman coding algorithm 
yields optimal source codes in the sense that the code words satisfy the prefix condition 
and the average block length is a minimum. To design a Huffman code for a DMS, 
we need to know the probabilities of occurrence of all the source letters. In the case 
of a discrete source with memory, we must know the joint probabilities of blocks of 
length n > 2. However, in practice, the statistics of a source output are often unknown. 
In principle, it is possible to estimate the probabilities of the discrete source output by 
simply observing a long information sequence emitted by the source and obtaining the 
probabilities empirically. Except for the estimation of the marginal probabilities {pk}, 
corresponding to the frequency of occurrence of the individual source output letters, 
the computational complexity involved in estimating joint probabilities is extremely 
high. Consequently, the application of the Huffman coding method to source coding 
for many real sources with memory is generally impractical. 
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In contrast to the Huffman coding algorithm, the Lempel-Ziv source coding 
algorithm does not require the source statistics. Hence, the Lempel-Ziv algorithm be- 
longs to the class of universal source coding algorithms. It is a variable-to-fixed-length 
algorithm, where the encoding is performed as described below. 

In the Lempel-Ziv algorithm, the sequence at the output of the discrete source is 
parsed into variable-length blocks, which are called phrases. A new phrase is introduced 
every time a block of letters from the source differs from some previous phrase in the 
last letter. The phrases are listed in a dictionary, which stores the location of the existing 
phrases. In encoding a new phrase, we simply specify the location of the existing phrase 
in the dictionary and append the new letter. 

As an example, consider the binary sequence 

10101 101001001 1 1010100001 1001 1 10101 10001 101 1 

Parsing the sequence as described above produces the following phrases: 

1 , 0 , 10 , 11 , 01 , 00 , 100 , 111 , 010 , 1000 , 011 , 001 , 110 , 101 , 10001 , 1011 

We observe that each phrase in the sequence is a concatenation of a previous phrase with 
a new output letter from the source. To encode the phrases, we construct a dictionary as 
shown in Table 6.3—4. The dictionary locations are numbered consecutively, beginning 
with 1 and counting up, in this case to 16, which is the number of phrases in the sequence. 
The different phrases corresponding to each location are also listed, as shown. The code 
words are determined by listing the dictionary location (in binary form) of the previous 
phrase that matches the new phrase in all but the last location. Then, the new output 
letter is appended to the dictionary location of the previous phrase. Initially, the location 
0000 is used to encode a phrase that has not appeared previously. 


■ TABLE 6.3-4 

Dictionary for Lempel-Ziv algorithm 



Dictionary location 

Dictionary contents 

Code word 

1 

0001 

1 

00001 

2 

0010 

0 

00000 

3 

0011 

10 

00010 

4 

0100 

11 

00011 

5 

0101 

01 

00101 

6 

0110 

00 

00100 

7 

0111 

100 

00110 

8 

1000 

111 

01001 

9 

1001 

010 

01010 

10 

1010 

1000 

oino 

11 

1011 

Oil 

01011 

12 

1100 

001 

01101 

13 

1101 

110 

01000 

14 

1110 

101 

00111 

15 

mi 

10001 

10101 

16 


1011 

11101 
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The source decoder for the code constructs an identical copy of the dictionary at 
the receiving end of the communication system and decodes the received sequence in 
step with the transmitted data sequence. 

It should be observed that the table encoded 44 source bits into 16 code words of 
5 bits each, resulting in 80 coded bits. Hence, the algorithm provided no data com- 
pression at all. However, the inefficiency is due to the fact that the sequence we have 
considered is very short. As the sequence is increased in length, the encoding procedure 
becomes more efficient and results in a compressed sequence at the output of the source. 

How do we select the overall length of the table? In general, no matter how large 
the table is, it will eventually overflow. To solve the overflow problem, the source 
encoder and source decoder must use an identical procedure to remove phrases from 
the respective dictionaries that are not useful and substitute new phrases in their place. 

The Lempel-Ziv algorithm is widely used in the compression of computer hies. 
The “compress” and “uncompress” utilities under the UNIX® operating system and 
numerous algorithms under the MS-DOS operating system are implementations of 
various versions of this algorithm. 


Our study of data compression techniques thus far has been limited to discrete infor- 
mation sources. For continuous-amplitude information sources, the problem is quite 
different. For perfect reconstruction of a continuous-amplitude source, the number of 
required bits is infinite. This is so because representation of a general real number 
in base 2 requires an infinite number of digits. Therefore, for continuous-amplitude 
sources lossless compression is impossible, and lossy compression through scalar or 
vector quantization is employed. In this section we study the notion of lossy data com- 
pression and introduce the rate distortion function which provides the fundamental limit 
on lossy data compression. To introduce the rate distortion function, we need to gen- 
eralize the notions of entropy and mutual information to continuous random variables. 

6.4-1 Entropy and Mutual Information for Continuous Random Variables 

The definition of mutual information given for discrete random variables may be ex- 
tended in a straightforward manner to continuous random variables. In particular, if X 
and Y are random variables with joint PDF p(x, y) and marginal PDFs p(x) and p(y ), 
the average mutual information between X and Y is defined as 


Although the definition of the average mutual information carries over to continuous 
random variables, the concept of entropy does not. The problem is that a continu- 
ous random variable requires an infinite number of binary digits to represent it ex- 
actly. Hence, its self-information is infinite, and, therefore, its entropy is also infinite. 


■ 6.4 

LOSSY DATA COMPRESSION 



(6.4-1) 
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Nevertheless, we shall define a quantity that we call the differential entropy of the 
continuous random variable X as 

/ OO 

p(x ) log p(x) dx (6.4-2) 

-OO 

We emphasize that this quantity does not have the physical meaning of self-information, 
although it may appear to be a natural extension of the definition of entropy for a discrete 
random variable (see Problem 6.15). 

By defining the average conditional entropy of X given Y as 


/•OO n OO 

H(X\Y) = — / p(x, y)logp(x\y)dxdy 

J — oo J —oo 

(6.4-3) 

the average mutual information may be expressed as 


I(X ; Y) = H(X)~ H(X\Y) 

(6.4-4) 

or, alternatively, as 


I(X\ Y) = H(Y ) - //(T|X) 

(6.4-5) 

In some cases of practical interest, the random variable X is discrete and Y is 
continuous. To be specific, suppose that X has possible outcomes x,-, i = 1,2,..., n, 
and Y is described by its marginal PDF p(y ). When X and Y are statistically dependent, 
we may express p(y) as 

n 

p(y ) = 'Y^,p(y\xi)Y[x i } 

/—I 

(6.4-6) 

The mutual information provided about the event X = x, by the occurrence of the event 
Y = y is 

lt , , p{y\xi)P[xi] 

l(xi \ y) = log 

p(y)Y\xi] 

. piyM 
= 1 °g , , 

p(y) 

(6.4-7) 


Then the average mutual information between X and Y is 


poo 

KX ; T) = V / P(yM?lXr] log dy (6.4-8) 

J - oo P(y) 

example 6.4-1. Suppose that X is a discrete random variable with two equally prob- 
able outcomes x\ = A and at = —A. Let the conditional PDFs p(y\.Xj). i = 1, 2, be 
Gaussian with mean Xj and variance a 2 . That is, 

P(y\A) = _^ e -o<-A)W 

\jL7lO 

P(y\-A) = 

y/ZTta 


(6.4-9) 
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The average mutual information obtained from Equation 6.4-8 becomes 


i(x ; Y) = 


1 

2 



p(y\A) log 


p(y\A) 
p(y ) 


p(y\—A) log 


p(y l-A) 
My) 




where 


p{y)=\[p{y\A) + p{y\-A)} 


(6.4-10) 


(6.4-11) 


Later in this chapter it will be shown that the average mutual information I(X\ Y ) given 
by Equation 6.4-10 represents the channel capacity of a binary-input additive white 
Gaussian noise channel. 


6.4-2 The Rate Distortion Function 

An analog source emits a message waveform x (t) that is a sample function of a stochastic 
process X(t). When X(t) is a band-limited, stationary stochastic process, the sampling 
theorem allows us to represent X(t) by a sequence of uniform samples taken at the 
Nyquist rate. 

By applying the sampling theorem, the output of an analog source is converted 
to an equivalent discrete-time sequence of samples. The samples are then quantized 
in amplitude and encoded. One type of simple encoding is to represent each discrete 
amplitude level by a sequence of binary digits. Hence, if we have L levels, we need 
R = log 2 L bits per sample if L is a power of 2, or R = |_l°g 2 LJ + 1 if L is not a power 
of 2. On the other hand, if the levels are not equally probable and the probabilities of 
the output levels are known, we may use Huffman coding to improve the efficiency 
of the encoding process. 

Quantization of the amplitudes of the sampled signal results in data compression, 
but it also introduces some distortion of the waveform or a loss of signal fidelity. The 
minimization of this distortion is considered in this section. Many of the results given 
in this section apply directly to a discrete-time, continuous-amplitude, memoryless 
Gaussian source. Such a source serves as a good model for the residual error in a 
number of source coding methods. 

In this section we study only the fundamental limits on lossy source coding given 
by the rate distortion function. Specific techniques to achieve the bounds predicted 
by theory are not covered in this book. The interested reader is referred to books and 
papers on scalar and vector quantization, data compression, waveform, audio and video 
coding referenced at the end of this chapter. 

We begin by studying the distortion introduced when the samples from the in- 
formation source are quantized to a fixed number of bits. By the term distortion, we 
mean some measure of the difference between the actual source samples { x k } and the 
corresponding quantized values {x k } which we denote by d(x k , x k ). For example, a 
commonly used distortion measure is the squared-error distortion, defined as 

d(x k , x k ) = ( x k - x k ) 2 (6.4-12) 

If d{x k , x k ) is the distortion measure per letter, the distortion between a sequence 
of n samples x„ and the corresponding n quantized values x n is the average over the n 
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source output samples, i.e., 


1 " 

d(x tl ,Xn) = N ' d( X [; , X/. ) 
n ,, 


(6.4-13) 


The source output is a random process, and hence the n samples in X„ are random 
variables. Therefore, d(X„, X n ) is a random variable. Its expected value is defined as 
the distortion D, i.e., 


where the last step follows from the assumption that the source output process is 
stationary. 

Now suppose we have a memoryless source with a continuous-amplitude output X 
that has a PDF p(x), a quantized amplitude output alphabet X, and a per letter distortion 
measure d(x, x). Then the minimum rate in bits per source output that is required to 
represent the output X of the memoryless source with a distortion less than or equal to 
D is called the rate distortion function R(D) and is defined as 


where I(X\ X) is the mutual information between X and X . In general, the rate R( D) 
decreases as D increases, or conversely R(D) increases as D decreases. 

As seen from the definition of the rate distortion function, R(D) depends on the 
statistics of the source p(x ) as well as the distortion measure d (x , x). A change in either 
of these two would change R(D). We also mention here that for many source statistics 
and distortion measures there exists no closed form for the rate distortion function 
R(D). 

The rate distortion function R(D) of a source is associated with the following 
fundamental source coding theorem in information theory. 

SHANNON’S THIRD THEOREM [SOURCE CODING WITH A FIDELITY CRITERION — 
shannon ( 1959)] A memoryless source X can be encoded at rate R for a distortion 
not exceeding D if R > R(D). Conversely, for any code with rate R < R(D) the 
distortion exceeds D. 

It is clear, therefore, that the rate distortion function R(D) for any source represents 
a lower bound on the source rate that is possible for a given level of distortion. 

The Rate Distortion Function for a Gaussian 
Source with Squared-Error Distortion 

One interesting model of a continuous-amplitude, memoryless information source is the 
Gaussian source model. For this source statistics and squared-error distortion measure 
d(x, x) = (x — x) 2 , the rate distortion function is known and is given by 



(6.4-14) 


R(D) = 


mm 

p(Jt|x):E[d(X,Z)]<D 


KX-, X) 


(6.4-15) 



0 <D<o 2 
D > o 2 


(6.4-16) 
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FIGURE 6.4-1 

Rate distortion function for a continuous-amplitude, 
memoryless Gaussian source. 


where a 2 is the variance of the source. Note that RAD) is independent of the mean 
E[ X | of the source. This function is plotted in Figure 6.4-1. 

We should note that Equation 6.4-16 implies that no information need be trans- 
mitted when the distortion D > a 2 . Specifically, D = a 1 can be obtained by using 
m = E [X] in the reconstruction of the signal. 

If in Equation 6.4-16 we reverse the functional dependence between D and R, we 
may express D in terms of R as 

D g (R) = l~ 2R a 2 (6.4-17) 

This function is called the distortion rate function for the discrete-time, memoryless 
Gaussian source. 

When we express the distortion in Equation 6.4-17 in decibels, we obtain 

10 log D g (R) = -6 R + 10 log a 2 (6.4-18) 

Note that the mean square error distortion decreases at the rate of 6 dB/bit. 

Explicit results on the rate distortion functions for general memoryless non- 
Gaussian sources are not available. However, there are useful upper and lower bounds 
on the rate distortion function for any discrete-time, continuous-amplitude, memoryless 
source. An upper bound is given by the following theorem. 

theorem: upper BOUND ON R ( D ) The rate distortion function of a memoryless, 
continuous-amplitude source with zero mean and finite variance a 2 with respect to 
the mean square error distortion measure is upper-bounded as 

R(D) < 1 log, 0 < D < a 2 (6.4-19) 

A proof of this theorem is given by Berger (1971). It implies that the Gaussian 
source requires the maximum rate among all other sources with the same variance 
for a specified level of mean square error distortion. Thus the rate distortion function 
R(D) of any continuous-amplitude memory less source with finite variance a 2 satisfies 
R(D) < R g (D). Similarly, the distortion rate function of the same source satisfies the 
condition 


D(R) < D g {R) = 2 ~ 2R o 2 


(6.4-20) 
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A lower bound on the rate distortion function also exists. This is called the Shannon 
lower bound for a mean square error distortion measure and is given as 

R*(D) = H(X) - ^ log, liteD (6.4-21) 

where H(X ) is the differential entropy of the continuous-amplitude, memoryless source. 
The distortion rate function corresponding to Equation 6.4—21 is 

D*(R) = —2~ 2[R - H(X)] (6.4-22) 

2ne 

Therefore, the rate distortion function for any continuous-amplitude, memoryless 
source is bounded from above and below as 

R*{D) < R(D ) < R g (D) (6.4-23) 

and the corresponding distortion rate function is bounded as 

D*(R ) < D(R) < D g (R) (6.4-24) 

The differential entropy of the memoryless Gaussian source is 

1 , 

H g (X ) = - log, 2neo 2 (6.4-25) 

so that the lower bound R*(D ) in Equation 6.4-21 reduces to R,,(D). Now, if we express 
D*(R) in terms of decibels and normalize it by setting a 2 = 1 (or dividing D*{R) by 
a 2 ), we obtain from Equation 6.4-22 

10 log D*(R) = -6 R - 6[H g (X ) - H(X)] (6.4-26) 

or, equivalently, 

Dg(R) 

10 log = 6[Hg(X ) - H(X)] dB 

S D*(R) g ' (6.4-27) 

= 6[/^(D) - R*(D)] dB 

The relations in Equations 6.4-26 and 6.4-27 allow us to compare the lower bound 
in the distortion with the upper bound which is the distortion for the Gaussian source. 
We note that D*(R ) also decreases at —6 dB/bit. We should also mention that the 
differential entropy H(X) is upper-bounded by H,JX). as shown by Shannon (1948b). 

Rate Distortion Function for a Binary Source with Hamming Distortion 

Another interesting and useful case in which a closed-form expression for the rate 
distortion function exists is the case of a binary source with p = P [X = 1 ] = 1 — 
P [X =0]. From the lossless source coding theorem, we know that this source can be 
compressed at any rate R that satisfies R > H{X) = and can be recovered 

perfectly from the compressed data. However if the rate falls below Hi,(p), errors will 
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occur in compression of this source. A measure of distortion that represents the error 
probability is the Hamming distortion, defined as 

f 1 x ^ x 

d(x,x) = { n (6.4-28) 

I 0 x = x 

The average distortion, when this distortion measure is used, is given by 
E [d(X, X)] = 1 x P[X / X] + 0 x P[X = X] 

= P[X^X] (6.4-29) 

= Pe 


It is seen that the average of Hamming distortion is the error probability in reconstruction 
of the source. 

The rate distortion function for a binary source and with Hamming distortion is 
given by 


R(D) = 
Note that as D 


= I H h (p) - H h (D) 0 <D< min {p, 1 - p } 

otherwise 


1 ° 

0, we have R(D ) 


Hb(p) as expected. 


(6.4-30) 


example 6.4-2. A binary symmetric source is to be compressed at a rate of 0.75 bit 
per source output. For a binary symmetric source we have p = \ and I !/,( p) = 1 . Since 
the compression rate, 0.75, is lower than the source entropy, error-free compression 
is impossible and the best error probability is found by solving R(D) = 0.75, where 
D is P e because we employ the Hamming distortion. From Equation 6.4-30 we have 
R(P e ) = H h (p)-H h (P e ) = 1 -H h (P e ) = 0.75. Therefore, Hb(P e ) = 1-0.75 = 0.25, 
from which we have P e = 0.04169. This is the minimum error probability that can be 
achieved using a system of unlimited complexity and delay. 


■ 6.5 

CHANNEL MODELS AND CHANNEL CAPACITY 

In the model of a digital communication system described in Chapter 1 , we recall that 
the transmitter building blocks consist of the discrete-input, discrete-output channel 
encoder followed by the modulator. The function of the discrete channel encoder is to 
introduce, in a controlled manner, some redundancy in the binary information sequence, 
which can be used at the receiver to overcome the effects of noise and interference 
encountered in the transmission of the signal through the channel. The encoding process 
generally involves taking k information bits at a time and mapping each k-bit sequence 
into a unique n-bit sequence, called a codeword. The amount of redundancy introduced 
by the encoding of the data in this manner is measured by the ratio n/k. The reciprocal 
of the ratio, namely k/n, is called the code rate and denoted by R c . 

The binary sequence at the output of the channel encoder is fed to the modulator, 
which serves as the interface to the communication channel. As we have discussed, the 
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modulator may simply map each binary digit into one of two possible waveforms; i.e., 
a 0 is mapped into s\(t) and a 1 is mapped into S 2 (t). Alternatively, the modulator may 
transmit q - bit blocks at a time by using M = 2 q possible waveforms. 

At the receiving end of the digital communication system, the demodulator pro- 
cesses the channel-corrupted waveform and reduces each waveform to a scalar or a 
vector that represents an estimate of the transmitted data symbol (binary or M- ary). 
The detector, which follows the demodulator, may decide whether the transmitted bit 
is a 0 or a 1. In such a case, the detector has made a hard decision. If we view the 
decision process at the detector as a form of quantization, we observe that a hard deci- 
sion corresponds to binary quantization of the demodulator output. More generally, we 
may consider a detector that quantizes to Q > 2 levels, i.e., a Q - ary detector. If M - ary 
signals are used, then Q > M. In the extreme case when no quantization is performed, 
Q = oo. In the case where Q > M, we say that the detector has made a soft decision. 

The quantized output from the detector is then fed to the channel decoder, which 
exploits the available redundancy to correct for channel disturbances. 

In the following sections, we describe three channel models that will be used to 
establish the maximum achievable bit rate for the channel. 


6.5-1 Channel Models 

In this section we describe channel models that will be useful in the design of codes. 
A general communication channel is described in terms of its set of possible in- 
puts, denoted by W' and called the input alphabet ; the set of possible channel out- 
puts, denoted by 3/ and called the output alphabet, and the conditional probabil- 
ity that relates the input and output sequences of any length n, which is denoted by 
P[yi, y 2 , ■ ■ ■ , y n l*i, x 2 , where x = (x x ,x 2 , . . ■ , x n ) and y = (yi, y 2 , . . . , y„) 

represent input and output sequences of length n, respectively. A channel is called 
memoryless if we have 


p [J I* ] = P l X| 1 for a11 n (6.5-1) 

! = 1 

In other words, a channel is memoryless if the output at time i depends only on the 
input at time i . 

The simplest channel model is the binary symmetric channel, which corresponds 
to the case with = (0, 1}. This is an appropriate channel model for binary 

modulation and hard decisions at the detector. 

The Binary Symmetric Channel (BSC) Model 

Let us consider an additive noise channel and let the modulator and the demodu- 
lator/detector be included as parts of the channel. If the modulator employs binary 
waveforms and the detector makes hard decisions, then the composite channel, shown 
in Figure 6.5-1, has a discrete-time binary input sequence and a discrete-time binary 
output sequence. Such a composite channel is characterized by the set W = {0, 1} of 
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FIGURE 6.5-1 

A composite discrete input, discrete output channel formed by including the modulator and the 
demodulator as part of the channel. 


possible inputs, the set of 3/ = {0, 1} of possible outputs, and a set of conditional 
probabilities that relate the possible outputs to the possible inputs. If the channel noise 
and other disturbances cause statistically independent errors in the transmitted binary 
sequence with average probability p, then 


P[T = 0|X = 1] =P[y = 1 |X = 0] = p 

(6.5-2) 

p[F = i |x = i] = P[y = 0|x = 0] = l - P 

Thus, we have reduced the cascade of the binary modulator, the waveform channel, 
and the binary demodulator and detector to an equivalent discrete-time channel which 
is represented by the diagram shown in Figure 6.5-2. This binary input, binary output, 
symmetric channel is simply called a binary symmetric channel (BSC). Since each 
output bit from the channel depends only on the corresponding input bit, we say that 
the channel is memoryless. 

The Discrete Memoryless Channel (DMC) 

The BSC is a special case of a more general discrete input, discrete output channel. The 
discrete memoryless channel is a channel model in which the input and output alphabets 
22" and 3/ are discrete sets and the channel is memoryless. For instance, this is the case 
when the channel uses an M - ary memoryless modulation scheme and the output of 
the detector consists of (2 -ary symbols. The composite channel consists of modulator- 
channel-detector as shown in Figure 6.5-1, and its input-output characteristics are 
described by a set of M Q conditional probabilities 

P [y \x ] for x e ST, y e 8/ (6.5-3) 

The graphical representation of a DMC is shown in Figure 6.5-3. 


i -p 



FIGURE 6.5-2 

Binary symmetric channel. 
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FIGURE 6.5-3 

Discrete memoryless channel. 


In general, the conditional probabilities {P [y \x ]} that characterize a DMC can be 
arranged in an | fifi \ x [ fi 1 / matrix of the form P = \ p, ; | . 1 < i < \W' \ ,\ < j < \ fit/ . 
P is called the probability transition matrix for the channel. 

The Discrete-Input, Continuous-Output Channel 

Now, suppose that the input to the modulator comprises symbols selected from a finite 
and discrete input alphabet 3/, with | 'Sfi | = M , and the output of the detector is 
unquantized, i.e., fit/ = M. This leads us to define a composite discrete-time memory less 
channel that is characterized by the discrete input X, the continuous output Y, and the 
set of conditional probability density functions 

p(y\x), x e fjel (6.5-4) 

The most important channel of this type is the additive white Gaussian noise (AWGN) 
channel, for which 


Y = X + N (6.5-5) 

where N is a zero-mean Gaussian random variable with variance a 2 . For a given X = x, 
it follows that Y is Gaussian with mean x and variance a 2 . That is, 

P(y\x) = r - 1 — (6.5-6) 
s/lTta 2 

For any given input sequence X h i = 1.2,...,//, there is a corresponding output 
sequence 


T, = Xi + Ni, i = l,2,...,/i (6.5-7) 

The condition that the channel is memoryless may be expressed as 

n 

p(yi ,yi |*t , * 2 , ...,*„) = p(y, \x t ) 

i = 1 


(6.5-8) 
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The Discrete-Time AWGN Channel 

This is a channel in which 'BP = = R. At each instant of time i, an input x\ e R is 

transmitted over the channel. The received symbol is given by 

y i = Xi + 11 j (6.5-9) 

where «,■’ s are iid zero-mean Gaussian random variables with variance a 2 . In addition, 
it is usually assumed that the channel input satisfies a power constraint of the form 

E[Z 2 ]<P (6.5-10) 

Under this input power constraint, for any input sequence of the form jc = (x\ , X2 , ■ ■ ■ , 
x„), where n is large with probability approaching 1, we have 

1 " 1 

-Tx 2 = -\\xf < P (6.5-11) 

n f-f n 

The geometric interpretation of the above constraint is that the input sequences to the 
channel are inside an n -dimensional sphere of radius sJnP centered at the origin. 

The AWGN Waveform Channel 

We may separate the modulator and the demodulator from the physical channel, and 
we consider a channel model in which the inputs are waveforms and the outputs are 
waveforms. Let us assume that such a channel has a given bandwidth W, with ideal 
frequency response C(/) = 1 within the frequency range [— W, +W], and the signal 
at its output is corrupted by additive white Gaussian noise. Suppose that x(t ) is a 
band-limited input to such a channel and y(t) is the corresponding output. Then 

y(t ) = x(t) + n{t) (6.5-12) 

where n(t) represents a sample function of the additive white Gaussian noise process 
with power spectral density of Usually, the channel input is subject to a power 
constraint of the form 

E [X 2 {t)] < P (6.5-13) 

which for ergodic inputs results in an input power constraint of the form 

i r T ' 2 , 

lim — / x 2 (t)dt < P (6.5-14) 

r^oo T J-t/2 

A suitable method for defining a set of probabilities that characterize the channel 
is to expand x(t), y(t). and n(t) into a complete set of orthonormal functions. From the 
dimensionality theorem discussed in Section 4.6-1 , we know that the dimensionality of 
the space of signals with an approximate bandwidth of W and an approximate duration 
of T is roughly 2 WT . Therefore we need a set of 2 IT dimensions per second to expand 
the input signals. We can add adequate signals to this set to make it a complete set 
of orthonormal signals that, by Example 2.8-1, can be used for expansion of white 
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processes. Hence, we can express x(t), y(t), and n(t) in the form 

x(t) = Y,Xj4>j{t) 


n(t ) = ^njcpjit) 
j 

y(0 = X! yjW) 


(6.5-15) 


where {y,}, [xj }, and {nj } are the sets of coefficients in the corresponding expansions, 

e-g-, 


y>j = / y(t)<l>j(t)dt 


(. x(t ) + n(t)) 4>j(t) dt 


(6.5-16) 


= *j + n j 


We may now use the coefficients in the expansion for characterizing the channel. 
Since 


yj = xj + nj (6.5-17) 

where n/s are iid zero-mean Gaussian random variables with variance o 1 2 = , it 

follows that 

P(yj\xj) = N ° ’ * = 1,2,... (6.5-18) 

and by the independence of tij ’ s 


N 

p(yi, yi, ■ ■ ■ , yN\x i,x 2 , • . , x N ) = piyMj) (6.5-19) 

7=1 

for any N. In this manner, the AWGN waveform channel is reduced to an equivalent 
discrete-time channel characterized by the conditional PDF given in Equation 6.5-18. 
The power constraint on the input waveforms given by Equation 6.5- 1 4 can be written as 


lim 

T — >-oo 


l 

T 



1 9 

= lim - x 2WTE [X 2 ] 

T— >oo T 

= 2WE [X 2 ] 


(6.5-20) 


< P 

where the first equality follows from orthonormality of the (</> ; (f), j = 1 , 2 ,..., 2 W T } , 
the second equality follows from the law of large numbers applied to the sequence 
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[xj, 1 < / < 2WT). and the last inequality follows from Equation 6.5-14. From 
Equation 6.5-20 we conclude that in the discrete-time channel model we have 

E[X 2 ]<2— (6.5-21) 

- W 

From Equations 6.5-19 and 6.5-21 it is clear that the waveform AWGN channel 
with bandwidth constraint W and input power constraint P is equivalent with 2 W uses 
per second of a discrete-time AWGN channel with noise variance of a 2 = ^ and an 
input power constraint given by Equation 6.5-21. 


6.5-2 Channel Capacity 

We have seen that the entropy and the rate distortion function provide the fundamental 
limits for lossless and lossy data compression. The entropy and the rate distortion 
function provide the minimum required rates for compression of a discrete memoryless 
source subject to the condition that it can be losslessly recovered, or can be recovered 
with a distortion not exceeding a specific D, respectively. In this section we introduce 
a third fundamental quantity called channel capacity that provides the maximum rate 
at which reliable communication over a channel is possible. 

Let us consider a discrete memoryless channel with crossover probability of p. In 
transmission of 1 bit over this channel the error probability is p, and when a sequence 
of length n is transmitted over this channel, the probability of receiving the sequence 
correctly is (1 — p)" which goes to zero as n — > oo. One approach to improve the per- 
formance of this channel is not to use all binary sequences of length n as possible inputs 
to this channel but to choose a subset of them and use only that subset. Of course this 
subset has to be selected in such a way that the sequences in it are in some sense “far 
apart” such that they can be recognized and correctly detected at the receiver even in 
the presence of channel errors. 

Let us assume a binary sequence of length n is transmitted over the channel. If n is 
large, the law of large numbers states that with high probability np bits will be received 
in error, and as n — »■ oo, the probability of receiving np bits in error approaches 1. The 
number of sequences of length n that are different from the transmitted sequence at np 
positions (np an integer) is 

n 

np 

By using Stirling’s approximation that states for large n we have 

n ! ~ y/2nn n n e~ n (6.5-23) 

Equation 6.5-22 can be approximated as 


(np)l(n(l - p))\ 


(6.5-22) 


n 

np 


as 2' ,/4(p) 


(6.5-24) 
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This means that when any sequence of length n is transmitted, it is highly probable 
that one of the 2" rih<l>> that are different from the transmitted sequence in np positions 
will be received. If we insist on using all possible input sequences for this channel, errors 
are inevitable since there will be considerable overlap between the received sequences. 
However, if we use a subset of all possible input sequences, and choose this subset 
such that the set of highly probable received sequences for each element of this subset 
is nonoverlapping, then reliable communication is possible. Since the total number of 
binary sequences of length n at the channel output is 2", we can have at most 


M = 


2 " 

2 ” H b(p) 


2«d-ff4(p)) 


(6.5-25) 


sequences of length n transmitted without their corresponding highly probable received 
sequences overlapping. Therefore, in n uses of the channel we can transmit M messages, 
and the rate, i.e., the information transmitted per each use of the channel, is given by 


1 


R = - log, M = 1 - H h (p) 
n 


(6.5-26) 


The quantity 1 — Hb(p) is the maximum rate for reliable communication over a binary 
symmetric channel and is called the capacity of this channel. In general the capacity of 
a channel, denoted by C, is the maximum rate at which reliable communication, i.e., 
communication with arbitrary small error probability, over the channel is possible. 

For an arbitrary DMC the capacity is given by 


C = max I(X; T) (6.5-27) 

p 

where the maximization is over all PMFs of the form p = \p\, p 2 , . . . , on the 
input alphabet The /;, ’s naturally satisfy the constraints 


Pi> o / = i,2,..., | an 

|sr | 

P< = 1 

i=l 


(6.5-28) 


The units of C are bits per transmission or bits per channel use, if in computing I(X; Y ) 
logarithms are in base 2, and nats per transmission when the natural logarithm (base e) 
is used. If a symbol enters the channel every r s seconds, the channel capacity is C /r s 
bits/s or nats/s. 

The significance of the channel capacity is due to the following fundamental the- 
orem, known as the noisy channel coding theorem. 


SHANNON’S SECOND THEOREM— THE NOISY CHANNEL CODING THEOREM (SHANNON 1948) 
Reliable communication over a discrete memoryless channel is possible if the commu- 
nication rate R satisfies R < C, where C is the channel capacity. At rates higher than 
capacity, reliable communication is impossible. 

The noisy channel coding theorem is of utmost significance in communication 
theory. This theorem expresses the limit to reliable communication and provides a 
yardstick to measure the performance of communication systems. A system performing 
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near capacity is a near optimal system and does not have much room for improvement. 
On the other hand a system operating far from this fundamental bound can be improved 
mainly through coding techniques described in Chapters 7 and 8. Although we have 
stated the noisy channel coding theorem for discrete memoryless channels, this theorem 
applies to a much larger class of channels. For details see the paper by Verdu and Flan 
(1994). 

We also note that Shannon’s proof of the noisy channel coding theorem is noncon- 
structive and employs a technique introduced by Shannon called random coding. In 
this technique instead of looking for the best possible coding scheme and analyzing its 
performance, which is a difficult task, all possible coding schemes are considered and 
the performance of the system is averaged over them. Then it is proved that if R < C, 
the average error probability tends to zero. This proves that among all possible coding 
schemes there exists at least one code for which the error probability tends to zero. We 
will discuss this notion in greater detail in Section 6.8-2. 

example 6.5-1. For a BSC, due to the symmetry of the channel, the capacity is 
achieved for a uniform input distribution, i.e., for P [X = 1] = P [X = 0] = The 
maximum mutual information is given by 

C = 1 + p log 2p + (1 — p)log 2(1 - p) = 1 - H(p) (6.5-29) 

This agrees with our earlier intuitive reasoning. A plot of C versus p is illustrated 
in Figure 6.5 — 4. Note that for p = 0, the capacity is 1 bit/channel use. On the other 
hand, for p = the mutual information between input and output is zero. Hence, the 
channel capacity is zero. For \ < p < 1, we may reverse the position of 0 and 1 at the 
output of the BSC, so that C becomes symmetric with respect to the point p = f. In 
our treatment of binary modulation and demodulation given in Chapter 4, we showed 
that p is a monotonic function of the SNR per bit. Consequently when C is plotted as 
a function of the SNR per bit, it increases monotonically as the SNR per bit increases. 
This characteristic behavior of C versus SNR per bit is illustrated in Figure 6.5-5 for 
the case where the binary modulation scheme is antipodal signaling. 


The Capacity of the Discrete-Time Binary -Input AW GN Channel We consider 
the binary-input AWGN channel with inputs ±A and noise variance a 2 . The transition 
probability density function for this channel is defined by Equation 6.5-6 where x = 
±A. By symmetry, the capacity of this channel is achieved by a symmetric input PMF, 
i.e., by letting P[X = A] = P[Z = — A] = Using these input probabilities, the 



FIGURE 6.5-4 

The capacity of a BSC. 
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FIGURE 6.5-5 

The capacity plot versus SNR per bit. 


capacity of this channel in bits per channel use is given by 


1 r°° p(y\A) l r°° »(v|-A) 

c = - / p(y\A)\og 2 !——dy+- p(y\~A) log 2 ' dy (6.5-30) 

2 J — oo p(y) 2 J- oo p{y) 


The capacity in this case does not have a closed form. In Problem 6.50 it is shown that 
the capacity of this channel can be written as 



(6.5-31) 


where 

/ °° i ( _ 2 2 

„^ e "^ log2 7T^‘'" (6 ' 5 - 32) 

Figure 6.5-6 illustrates C as a function of the ratio jf. Note that C increases monoton- 
ically from 0 to 1 bit per symbol as this ratio increases. The two points shown on this 
plot correspond to transmission rates of ' and Note that the ^ required to achieve 
these rates is 0.188 and —0.496, respectively. 


Capacity of Symmetric Channels It is interesting to note that in the two channel 
models described above, the BSC and the discrete-time binary-input AWGN channel, 
the choice of equally probable input symbols maximizes the average mutual infor- 
mation. Thus, the capacity of the channel is obtained when the input symbols are 
equally probable. This is not always the solution for the capacity formulas given in 
Equation 6.5-27, however. In the two channel models considered above, the channel 
transition probabilities exhibit a form of symmetry that results in the maximum of 
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FIGURE 6.5-6 

The capacity of binary input AWGN channel. 


I(X; Y) being obtained when the input symbols are equally probable. A channel is 
called a symmetric channel when each row of P is a permutation of any other row 
and each column of it is a permutation of any other column. For symmetric channels, 
input symbols with equal probability maximize I(X\ Y). The resulting capacity of a 
symmetric channel is 


C = \og 2 \&\- H{p) (6.5-33) 

where p is the PMF given by any row of P . Note that since the rows of P are permuta- 
tions of each other, the entropy of the PMF corresponding to each row is independent 
of the row. One example of a symmetric channel is the binary symmetric channel for 
which p = {p, 1 — p) and \SY \ = 2, therefore C = 1 — Hb(p). 

In general, for an arbitrary DMC, the necessary and sufficient conditions for the 
set of input probabilities {P [x]} to maximize I(X\ Y) and, thus, to achieve capacity on 
a DMC are that (Problem 6.52) 

7(x; Y) = C for all x e '3? with P [xl > 0 

(6.5-34) 

I(x; Y) < C for all x e W with P [x] = 0 
where C is the capacity of the channel and 


/(x;F) = ^P[y|x]log^i 

)>g a/ L/J 


(6.5-35) 
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Usually, it is relatively easy to check if the equally probable set of input symbols 
satisbes the conditions given in Equation 6.5-34. If they do not, then one must determine 
the set of unequal probabilities (P [x]} that satisbes Equation 6.5-34. 


The Capacity of Discrete-Time AWGN Channel with an Input Power Constraint 
Here we deal with the channel model 


Yi = Xi + Ni (6.5-36) 

where Nf s are iid zero-mean Gaussian random variables with variance a 1 and input X 
is subject to the power constraint 

E [X 2 ] < P (6.5-37) 


For large n, the law of large numbers states that 

-llyll 2 -> E [X 2 ] + E [iV 2 ] < P + o 2 (6.5-38) 

n 

Equation 6.5-38 states that the output vector y is inside an n -dimensional sphere of 
radius \/'n( P + a 2 ). If jc is transmitted, the received vector y = x + n satisbes 

-||J-*II 2 = -||«|| 2 ^a 2 (6.5-39) 

n n 


which means if x is transmitted, with high probability y will be in an n -dimensional 
sphere of radius sjna 2 and centered at x. The maximum number of spheres of radius 
\Jna 2 that can be packed in a sphere of radius \/n( P + a 2 ) is the ratio of the volumes 
of the spheres. The volume of an n-dimensional sphere is given by V„ = B n R”, where 
B n is given by Equation 4.7-15. Therefore, the maximum number of messages that can 
be transmitted and still be resolvable at the receiver is 


M = 


B n (y/nlP+CT 2 ))" 



cr- 


(6.5-40) 


which results in a rate of 


1 

R = — log 2 M = 
n 





bits/transmission 


(6.5-41) 


This result can be obtained by direct maximization of I(X\ Y) over all input PDFs 
p(x) that satisfy the power constraint E [ A 2 ] < P. The input PDF that maximizes 
I(X\ Y) is a zero-mean Gaussian PDF with variance P. A plot of the capacity for this 
channel versus SNR per bit is shown in Figure 6.5-7. The points corresponding to 
C = j and C = are also shown on the hgure. 


The Capacity of Band-Limited Waveform AWGN Channel with an Input Power 
Constraint As we have seen by the discussion following Equation 6.5-21, this channel 
model is equivalent to 2 W uses per second of a discrete-time AWGN channel with input 


366 


Digital Communications 



FIGURE 6.5-7 

The capacity of a discrete-time AWGN channel. 


power constraint of and noise variance of a 2 = The capacity of this discrete-time 

channel is 

C = * log 2 ^1 + ^ log 2 ^1 + j bits/channel use (6.5-42) 

Therefore, the capacity of the continuous-time channel is given by 

C = 2Wx - logo ( 1 + — — ^ = W logo ( 1 + — — ^ bits/s (6.5—43) 

2 62 V N 0 Wj 62 V NoWJ 

This is the celebrated equation for the capacity of a band-limited AWGN channel with 
input power constraint derived by Shannon (1948b). 

From Equation 6.5-43, it is clear that the capacity increases by increasing P, and 
in fact C — ^ oo as P — y oo. However, the rate by which the capacity increases at 
large values of P is a logarithmic rate. Increasing W. however, has a dual role on the 
capacity. On one hand, it causes the capacity to be increased because higher bandwidth 
means more transmissions over the channel per unit time. On the other hand, increasing 
W decreases the SNR defined by This is so because increasing the bandwidth 
increases the effective noise power entering the receiver. To see how the capacity 
changes as W — oo, we need to use the relation ln(l + x) — »■ x as x — >• 0 to get 

Coo = lim W log 2 f 1 + — ^—) = (log 2 e) « 1.44 bits/s (6.5-44) 

It is clear that the having infinite bandwidth cannot increase the capacity indefinitely, 
and its effect is limited by the amount of available power. This is in contrast to the 
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effect of having infinite power that, regardless of the amount of available bandwidth, 
can increase the capacity indefinitely. 

To derive a fundamental relation between the bandwidth and power efficiency of a 
communication system, we note that for reliable communication we must have R < C 
which in the case of a band-limited AWGN channel is given by 

r<w1o82 ( 1 + n^) <65 ^ 5) 

Dividing both sides by W and using r = R/W, as previously defined in Equation 4.6-1 
as the bandwidth efficiency, we obtain 


Using the relation 


we obtain 


r < log, | 1 H — ) 

V No w) 

£ = £ = PTs 
log 2 M log, M 

' < 1o& (' + |w) = logj (' 


from which we have 


£ b > 2 ' ~ l 
N 0 > r 


P 

R 


+ r 



(6.5-46) 


(6.5-47) 


(6.5-48) 


(6.5-49) 


This relation states the condition for reliable communication in terms of bandwidth 
efficiency r and which is a measure of power efficiency of a system. A plot of 

this relation is given in Figure 4.6-1. The minimum value of for which reliable 
communication is possible is obtained by letting r — > 0 in Equation 6.5-49, which 
results in 


— > In 2 ss 0.693 ~ - 1 .6 dB (6.5-50) 

N 0 

This is the minimum required value of ff- for any communication system. No system 
can transmit reliably below this limit and in order to achieve this limit we need to let 
r -> 0, or equivalently, W oc. 


■ 6.6 

ACHIEVING CHANNEL CAPACITY WITH ORTHOGONAL SIGNALS 

In Section 4.4-1 , we used a simple union bound to show that, for orthogonal signals, 
the probability of error can be made as small as desired by increasing the number M 
of waveforms, provided that S-b/No > 2 In 2. We indicated that the simple union bound 
does not produce the smallest lower bound on the SNR per bit. The problem is that the 
upper bound used in Q(x) is very loose for small x. 
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An alternative approach is to use two different upper bounds for Q(x ), depending 
on the value of x. Beginning with Equation 4.4—10 and using the inequality (1 — x)" > 
1 — nx, which holds for 0 < x < 1 and n > 1, we observe that 

1 - [1 - Q(x)] m - 1 <(M - l)0Oc) < Me~ x2/1 (6.6-1) 

This is just the union bound, which is tight when x is large, i.e., for x > xo, where xo 
depends on M . When x is small, the union bound exceeds unity for large M. Since 

1 - [1 - Q { x )\ m - 1 < 1 ( 6 . 6 - 2 ) 


for all x, we may use this bound for x < xq because it is tighter than the union bound. 
Thus Equation 4.4—10 may be upper-bounded as 


P p < 


‘ r " re-‘»e-(’-^Y*dx 

s/'ZtT J— oo V27T J xo 


(6.6-3) 


where y = 4r- 

The value of xo that minimizes this upper bound is found by differentiating the 
right-hand side of Equation 6.6-3 and setting the derivative equal to zero. It is easily 
verified that the solution is 


e x o / 2 = m 


(6.6-4) 


or, equivalently, 


xq = \/21n M = \/21n21og2 M = \/2k\n2 


(6.6-5) 


Having determined xo, we now compute simple exponential upper bounds for the 
integrals in Equation 6.6-3. For the first integral, we have 


(y2)J-^o)/V2 


-L r * dx = 2= . 

V 27T J— oo V 71 ^ — oo 

= Q(y/2y - Xo), 

The second integral is upper-bounded as follows: 


du 

xo < \/2 y 


(6.6-6) 


xo 


< 


\/2 y 


M 

~j2jt ■ 


s/2 n d = 


M 


o-v! 2 


f 

J Xo 


du 


-y/vP- 

Me~Yl 2 

Me~ y/2 e~~Y 0 ~VYP) 


xo < y/Y/2 

xo > y/y/2 
(6.6-7) 

Combining the bounds for the two integrals and substituting e r «/ 2 for M . we obtain 

g -( V ^-*») 2 / 2 + e (* 5 - r )/2 0 < x Q < y / 712 . 


P, < 


-{y/lY-x o) P + e (x 2 0 -y)/2 e -(xo-y/yP) 


y/Yi 2 < X 0 < y/2 y 


( 6 . 6 - 8 ) 
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In the range 0 < xq < -Jy /2. the bound may be expressed as 

P e < gW-y)/ 2 ( 1 + < 2e( x o-^ /2 , 0 < x 0 < y/y/2, (6.6-9) 


In the range ~Jy / 2 < xo < V2j/, the two terms in Equation 6.6-8 are identical. Hence, 

P e < 2e~ (a/^~*°) / 2 , y972 < x 0 < y/l y (6.6-10) 

Now we substitute for xo and y. Since xo = 2 In M = -Jlk In 2 and y = kyi„ the 
bounds in Equations 6.6-9 and 6.6-10 may be expressed as 


2 e ~ k ( Yb - 2ln2 )/ 2 

2 e ~ k {'/Yb-'fin 2 ) 


In M < ^ y 
\ y < In M < y 


( 6 . 6 - 11 ) 


The first upper bound coincides with the union bound presented earlier, but it is loose 
for large values of M. The second upper bound is better for large values of M. We 
note that P e — > 0 as k — > oo (M — > oo) provided that y/, > In 2. But In 2 is the 
limiting value of the SNR per bit required for reliable transmission when signaling 
at a rate equal to the capacity of the infinite -bandwidth AWGN channel, as shown in 
Equation 6 . 5 - 50 . In fact, when the substitutions yo = sjlk In 2 = sj2Rl' \n2 and 
y = £/N ( ) = T P /Nq = T In 2, which follow from Equation 6 . 5 - 44 , are made into 
the two upper bounds given in Equations 6 . 6-9 and 6 . 6 - 10 , the result is 


Pe < 


2x2 r ) 

2 2 ^ V 


0 < R < jCoo 

iCoo < R < Coo 


( 6 . 6 - 12 ) 


Thus we have expressed the bounds in terms of Coo and the bit rate in the channel. 
The first upper bound is appropriate for rates below ^ C^o, while the second is tighter 
than the first for rates between ^Coo and Coo- Clearly, the probability of error can 
be made arbitrarily small by making T — > oo (M — > oo for fixed R), provided that 
R < Coo = P/(Nq In 2). Furthermore, we observe that the set of orthogonal waveforms 
achieves the channel capacity bound as M — >■ oo, when the rate R < Coo- 


■ 6.7 

THE CHANNEL RELIABILITY FUNCTION 

The exponential bounds on the error probability for M - ary orthogonal signals on an 
infinite-bandwidth AWGN channel given by Equation 6.6-12 may be expressed as 


P e < 2 x 2~ TE(R) 


(6.7-1) 
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FIGURE 6.7-1 

Channel reliability function for the infinite-bandwidth 
AWGN channel. 


The exponential factor 


E(R) = 



0< R < 

5 Coo <R<C 


oo 


(6.7-2) 


in Equation 6.7-2 is called the channel reliability function for the infinite-bandwidth 
AWGN channel. A plot of E(R)/C 00 is shown in Figure 6.7-1. Also shown is the 
exponential factor for the union bound on P e , given by Equation 4.4-17, which may be 
expressed as 

P e < ^ x 2- t (i c °°- r \ 0 < R < ^Coo (6.7-3) 

Clearly, the exponential factor in Equation 6.7-3 is not as tight as E(R), due to the 
looseness of the union bound. 

The bound given by Equations 6.7-1 and 6.7-2 has been shown by Gallager (1965) 
to be exponentially tight. This means that there does not exist another reliability func- 
tion, say E\(R), satisfying the condition E\(R) > E(R) for any R. Consequently, the 
error probability is bounded from above and below as 

K,2~ TE{R) <P e < K u 2~ TE(R) (6.7-4) 


where the constants have only a weak dependence on T in the sense that 

lim —In K/ = lim — In K u = 0 (6.7-5) 

T— mx> T T— »oo T 

Since orthogonal signals are asymptotically optimal for large M, the lower bound 
in Equation 6.7—4 applies for any signal set. Hence, the reliability function E(R) given 
by Equation 6.7-2 determines the exponential characteristics of the error probability 
for digital signaling over the infinite-bandwidth AWGN channel. 

Although we have presented the channel reliability function for the infinite- 
bandwidth AWGN channel, the notion of channel reliability function can be applied to 
many channel models. In general, for many channel models, the average error proba- 
bility over all the possible codes generated randomly satisfies an expression similar to 
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Equation 6.7-4 of the form 



(6.7-6) 


where E(R) is positive for all R < C. Therefore, if R < C, it is possible to arbitrarily 
decrease the error probability by increasing n. This, of course, requires unlimited de- 
coding complexity and delay. The exact expression for the channel reliability function 
can be derived for just a few channel models. For more details on the channel reliability 
function, the interested reader is referred to the book by Gallager (1968). 

Although the error probability can be made small by increasing the number of 
orthogonal, biorthogonal, or simplex signals, with R < C-^, for a relatively modest 
number of signals, there is a large gap between the actual performance and the best 
achievable performance given by the channel capacity formula. For example, from 
Figure 4.6-1, we observe that a set of M = 16 orthogonal signals detected coherently 
requires an SNR per bit of approximately 7.5 dB, to achieve abit errorrate of P e = 10 5 . 
In contrast, the channel capacity formula indicates that for a C/W = 0.5, reliable 
transmission is possible with an SNR of —0.8 dB, as indicated in Figure 6.5-7. This 
represents a rather large difference of 8.3 dB/bit and serves as a motivation for searching 
for more efficient signaling waveforms. In this chapter and in Chapters 7 and 8, we 
demonstrate that coded waveforms can reduce this gap considerably. 

Similar gaps in performance also exist in the bandwidth-limited region of 
Figure 4.6-1, where R/W > 1. In this region, however, we must be more clever in 
how we use coding to improve performance, because we cannot expand the bandwidth 
as in the power-limited region. The use of coding techniques for bandwidth-efficient 
communication is treated in Chapters 7 and 8. 


The design of coded modulation for efficient transmission of information may be divided 
into two basic approaches. One is the algebraic approach, which is primarily concerned 
with the design of coding and decoding techniques for specific classes of codes, such as 
cyclic block codes and convolutional codes. The second is the probabilistic approach, 
which is concerned with the analysis of the performance of a general class of coded 
signals. This approach yields bounds on the probability of error that can be attained for 
communication over a channel having some specified characteristic. 

In this section, we adopt the probabilistic approach to coded modulation. The 
algebraic approach, based on block codes and on convolutional codes, is treated in 
Chapters 7 and 8. 


■ 6.8 


THE CHANNEL CUTOFF RATE 


6.8-1 Bhattacharyya and Chernov Bounds 

Fet us consider a memoryless channel with input alphabet JSPand output alphabet 
which is characterized by the conditional PDF p(y\x). By the memoryless assumption 
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of the channel 


n 

p(y\x) = Y[p(yi\xi) ( 6 . 8 - 1 ) 

i=i 

where x = (x\, X 2 , . . . , x n ) and y = ( Vi , >' 2 , .... y„) arc input and output sequences of 
length n. We further assume that from all possible input sequences of length n. a subset of 
size M = 2 k denoted by X\, xi, . . . , x m and called codewords is used for transmission. 
Let us represent by P e \ m the error probability when x m is transmitted and a maximum- 
likelihood detector is employed. By the union bound and using Equations 4.2-64 to 
4.2-67 we can write 


M 

Pe\ m — ^ ^ P [ 3 ^ ^ D m i \x m sent] 

m'= 1 
m'^m 

M 

< ^ P[y e D mm , \x m sent] 

m'—l 

m'^m 


( 6 . 8 - 2 ) 


where D mm > denotes the decision region for m' in a binary system consisting of x m and 
x,„' and is given by 


D„ 


in which we have defined 


= (j : p(y\x,n') > p(j|-c m )} 

f , P(y\Xm ') „ 

= < y : In — — > 0 


p(y\x m ) 

= {y : Z mm ' > 0} 


Z ... ... ' — In 


p(y\x,n') 
p(y\x m ) 


(6.8-3) 


As in Section 4.2-3, we denote P [y e D mm f |jc,„ sent ] by P m 
error probability, or PEP. It is clear from Equation 6.8-3 that 


(6.8-4) 
■ and call it pairwise 


= P [Z mm , > 0 |x„, ] 


<E e 


xz,„ 


(6.8-5) 


where in the last step we have used the Chernov bound given by Equation 2.4^1, and 
the inequality is satisfied for all X > 0. Substituting for Z mm > from Equation 6.8-4, we 
obtain 

^ 1 1 p(y\ x m') 

Pm-+m' < ^2 e " PWXm) P(y\ X m) 

y^& n 

= P X (y\ X m')p l ~ X {y\X m ) x>o 
y&» n 


( 6 . 8 - 6 ) 
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This is the Chernov bound for the pairwise error probability. A simpler form of this 
bound is obtained when we put X = 4 . In this case the resulting bound 

Pm^m' < E Vp(y\ X m)p(y\Xm') (6.8-7) 

re<r" 

is called the Bhattacharyya bound. If the channel is memoryless, the Chernov bound 
reduces to 


< 


n 

i = 1 


(yi\Xm'i)P (.yi \Xmi ) 


J/€ 6 


X >0 


The Bhattacharyya bound for a memoryless channel is given by 


n 

Pm— cm' — iie V p(yi\Xm'i)p(yi\Xmi) 

i = 1 yie& 


( 6 . 8 - 8 ) 


(6.8-9) 


Let use define two functions and A t| l2 , called Chernov and Bhatacharyya 

parameters, respectively, as 


ye a/ 

a X uX2 = E Vp(y\ x i)p(y\ x 2) 

yea/ 


( 6 . 8 - 10 ) 


Note that A^. = A XuXl = 1 for all x\ e 'SC Using these definitions, Equations 6.8-8 

and 6.8-9 reduce to 


Pm^< IIEEv,- ^>° (6.8-11) 

!=1 

and 


Pm-> in' < ]^[ A XmiiXm ,. ( 6.8 12 ) 

1 = 1 

example 6 . 8 - 1 . Assume x,„ and x m > are two binary sequences of length n which 
differ in d components; d is called the Hamming distance between the two sequences. 
If a binary symmetric channel with crossover probability p is employed to transmit x m 
and x m i, we have 

n 

Prn^m' < A*^,, 

1=1 

n 

= \J p(f - p) + (1 - p)p ( 6 . 8 - 13 ) 

1=1 

X m j ^ X m fj 

= (VfPd - p)) 

where we have used the fact that if x ml - = x m /;, then A XmiiXm ,. = 1. 
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If, instead of the BSC, we use BPSK modulation over an AWGN channel, in which 
0 and 1 in each sequence are mapped into — J~E~ C and +^/TT c and £ c denotes energy per 
component, we will have 


In both cases the Bhattacharyya bound is of the form A d , where for the BSC 


A = *j4p(\ — p ) and for an AWGN channel with BPSK modulation A = e *5 . If 
p ^ \ and £ c > 0, in both cases A < 1 and therefore as d becomes large, the error 
probability goes to zero. 


6.8-2 Random Coding 

Let us assume that instead of having two specific codewords x m and x m <, we generate 
all M codewords according to some PDF p(x) on the input alphabet W. We assume 
that all codeword components and all codewords are drawn independently according 
to p(x). Therefore, each codeword x m = (x m \ , x m 2 , . . . , x mn ) is generated according 
to n7=i P( x mi)- If we denote the average of the pairwise error probability over the set 
of randomly generated codes by we have 


n 


ID* 



(6.8-14) 





X m €%P n X m /€T n 


n 


^ e e n (p (x mi)p(x m 'i) A ( ^ Xm ,^) 


x m i=: r" ve*" i = l 



(6.8-15) 
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Let us define 


R 0 (p, A) = - logo 


EE p(xi)p(x 2 )Af-^ Xi 


= ~ l0g 2 


E 



x 2 


A > 0 


(6.8-16) 


where X\ and Xi_ are independent random variables with joint PDF p(x\)p(x 2 ). Using 
this definition, Equation 6.8-15 can be written as 

< 2~ nRa(p ’ x) A > 0 (6.8-17) 


We define P e \ m as the average of P e \ m over the set of random codes generated using 
p(x). Using this definition and Equation 6.8-2, we obtain 


M 

Pe\m E ^ ^ 

m'= 1 
m'^m 

M 

= 2 -nR 0 (p,X) 

m'= 1 
m'^m 

_ 2~ n ( R o(p^)- R c) 


A > 0 


(6.8-18) 


We have used the relation M = 2 k = 2" Rc , where R, = - denotes the rate of the code. 
Since the right-hand side of the inequality is independent of m, by averaging over m 
we have 


P e < 2 - n ( R °(pM- R c) x > 0 (6.8-19) 

where P e is the average error probability over the ensemble of random codes generated 
according to p(x). Equation 6.8-19 states that if R, < R(,( p. A), for some input PDF 
p(x) and some A > 0, then for n large enough, the average error probability over 
the ensemble of codes can be made arbitrarily small. This means that among the set 
of codes generated randomly, there must exist at least one code for which the error 
probability goes to zero as n -> oo. This is an example of the random coding argument 
first introduced by Shannon in the proof of the channel capacity theorem. 

The maximum value of Ro(p, A) over all probability density functions p(x) and all 
A > 0 gives the quantity Rq, known as the channel cutoff rate, defined by 


Ro = max sup Ro(p, A) 

PbO x>0 


= max sup 

PM *>0 


-log 2 




( 6 . 8 - 20 ) 


Clearly if either SCor S^or both are continuous, the corresponding sums in the devel- 
opment of R o are substituted with appropriate integrals. 
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For symmetric channels, the optimal value of A that maximizes the cutoff rate is 
A = i for which the Chernov bound reduces to the Bhattacharyya bound and 


tf 0 = max — log 2 [E [A Xi ,x 2 ]] 

pM 


= max — log 2 

pto 


E ( E pMV p(y\x) 


re<r \ies 


( 6 . 8 - 21 ) 


In addition to these channels, the PDF maximizing Ro(p, A) is a uniform PDF; i.e., 
if Q = | gp |, we have p{x ) = for all x e In this case we have 

r 1 


R 0 = - log 2 

= 21og 2 Q - log 2 


^E E Vp(yW 

^ ye&'\xe%’ ) 


E E \Zp(y\x) 


yegr \J£€.s 


(6.8-22) 


Using the inequality 

(e ’/pO'I*)) >T,p<yw 

and summing over all y, we obtain 

E (e >EE^ w 

ye^'Xxe^’ ) xew yegs 

= Q 

Employing this result in Equation 6.8-22 yields 


(6.8-23) 


(6.8-24) 


Ro = 2 log 2 Q - log 2 

< log 2 Q 


E E Vp(yW 


yegr \x€i 


(6.8-25) 


as expected. 

For a symmetric binary-input channel, these relations can be further reduced. In 
this case 


A Xl,X 2 ~ 


(6.8-26) 


A X\ 7^ X2 

^ 1 Xi = x 2 

where A is the Bhattacharyya parameter for the binary input channel. In this case 
<2 = 2 and we obtain 

1 + A 

(6.8-27) 


R<> = - log 2 


= 1 — log 2 (1 + A) 

Since reliable communication is possible at all rates lower than the cutoff rate, we 
conclude that Ro < C. In fact, we can interpret the cutoff rate as the supremum of the 
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rates at which a bound on the average error probability of the form 2~ n( - R °~ Rc '> is possible. 
The simplicity of the exponent in this bound is particularly attractive in comparison 
with the the general form of the bound on error probability given by 2~ nE<R< ) , where 
E(R C ) denotes the channel reliability function. Note that Rq — R, is positive for all 
rates less than Rq, but E( R c ) is positive for all rates less than capacity. We will see in 
Chapter 8 that sequential decoding of convolutional codes is practical at rates lower 
than Rq. Therefore, we can also interpret Rq as the supremum of the rates at which 
sequential decoding is practical. 


example 6 . 8 - 2 . For a BSC, with crossover probability p we have VP = 3/ = { 0 , 1 }. 
Using the symmetry of the channel, the optimal A is 5 and the optimal input distribution 
is a uniform distribution. Therefore, 


R 0 = 2 log 2 2 - log 2 V P(y\ x ) 

v=0, 1 Vv=0,l J 


= 2 log 2 2 - log. 


(\/i - p + Vp) + [Vp+ V 1 ~ p) 


(6.8-28) 


2 log, 2 — log, (2 + 4^p(l - p)j 
2 

82 1 + V 4 f( 1 - P) 


We could also use the fact that A = V 4 p( 1 — p) and use Equation 6.8-27 to obtain 

*o = l- log 2 (l + A) = 1 - log 2 (l + y/Ap{\ - p)) (6.8-29) 

A plot of Rq versus p is shown in Figure 6.8-1. The capacity of this channel C = 
1 — H/,(p) is also shown on the same plot. It is observed that C > Rq, for all p. 



FIGURE 6.8-1 

Cutoff rate and channel capacity plots for a binary symmetric channel. 
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If the BSC channel is obtained by binary quantization of the output of an AWGN 
channel using BPSK modulation, we have 


p=Q 



(6.8-30) 


where £ c denotes energy per component of x. Note that with this notation the total 
energy in x is £ = n£ c ; and since each jc carries k = log 2 M bits of information, we 
have £ b — j — |£ c , or £ c = R c £ b , where R, = £ is the rate of the code. If the rate of 
the code tends to Rq, we will have 


P = Q (V R o Yb) 

where y b = £ b /No. From the pair of relations 


P = Q (\/ R oYb) 

R ° = l0§2 1 + V4p(l - P) 


(6.8-31) 


(6.8-32) 


we can plot Rq as a function of y b . Similarly, from the pair of relations 

P = Q [V R o Yb) 

C = 1 - H b (p) 


(6.8-33) 


we can plot C as a function of y b . These plots that compare Rq and C as functions of y b 
are shown in Figure 6.8-2. From this figure it is seen that there exists a gap of roughly 
2-2.5 dB between Rq and C. 



FIGURE 6.8-2 

Capacity and cutoff rate for an output quantized BPSK scheme. 
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example 6.8-3. Foran AWGN channel with BPSK modulation we have 15R = { iV^rl- 
The output alphabet in this case is the set of real numbers R. We have 



y/p{y\x) 




i r°° R+gc 

= 2 + 2 / e N ° dy 

+ 7t Nq J —oc 

= 2 + 2e~+> 




(W£)- 

e N o 


dy 


(6.8-34) 


Finally, using Equation 6.8-22, we have 


Rq — 2 log 2 2 - log 2 (2 + 2e N o) 
2 

= log 2 


= log 2 


1 + e N » 
2 


1 + e 


-R'% 


(6.8-35) 


Here A = e~ £c ^ N ° and using Equation 6.8-27 will result in the same expression for R<). 
A plot of Ro, as well as capacity for this channel which is given by Equation 6.5-31, 
is shown in Figure 6.8-3. 

In Figure 6.8-4 plots of Rq and C for BPSK with continuous output (soft decision) 
and BPSK with binary quantized output (hard decision) are compared. 



FIGURE 6.8-3 

Cutoff rate and channel capacity plots for an AWGN channel with BPSK modulation. 
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FIGURE 6.8-4 

Capacity and cutoff rate for a hard and soft decision decoding of a BPSK scheme. 

Comparing the Rq’s for hard and soft decisions, we observe that soft decision has 
an advantage of roughly 2 dB over hard decision. If we compare capacities, we observe 
a similar 2-dB advantage for soft decision. Comparing Rq and C, we observe that in 
both soft and hard decisions, capacity has an advantage of roughly 2-2.5 dB over Rq. 
This gap is larger at lower SNRs and decreases to 2 dB at higher SNRs. 


■ 6.9 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

Information theory, the mathematical theory of communication, was founded by 
Shannon (1948, 1959). Source coding has been an area of intense research activity 
since the publication of Shannon’s classic papers in 1948 and the paper by Huffman 
(1952). Over the years, major advances have been made in the development of highly 
efficient source data compression algorithms. Of particular significance is the research 
on universal source coding and universal quantization published by Ziv (1985), Ziv and 
Lempel (1977, 1978), Davisson (1973), Gray (1975), and Davisson et al. (1981). 

Treatments of rate distortion theory are found in the books by Gallager (1968), 
Berger (1971), Viterbi and Omura (1979), Blahut (1987), and Gray ( 1 990). For practical 
applications of rate distortion theory to image and video compression, the reader is 
referred to the IEEE Signal Processing Magazine , November 1998, and to the book by 
Gibson et al. (1998). The paper by Berger and Gibson (1998) on lossy source coding 
provides an overview of the major developments on this topic over the past 50 years. 

Over the past decade, we have also seen a number of important developments 
in vector quantization. A comprehensive treatment of vector quantization and signal 
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compression is provided in the book of Gersho and Gray (1992). The survey paper by 
Gray and Neuhoff (1998) describes the numerous advances that have been made on the 
topic of quantization over the past 50 years and includes a list of over 500 references. 

Pioneering work on channel characterization in terms of channel capacity and 
random coding was done by Shannon (1948a, b: 1949). Additional contributions were 
subsequently made by Gilbert (1952), Elias (1955), Gallager (1965), Wyner (1965), 
Shannon et al. (1967), Forney (1968), and Viterbi (1969). All these early publications are 
contained in the IEEE Press book entitled Key Papers in the Development of Information 
Theory , edited by Slepian (1974). The paper by Verdu (1998) in the 50th Anniversary 
Commemorative Issue of the Transactions on Information Theory gives a historical 
perspective of the numerous advances in information theory over the past 50 years. 

The use of the cutoff rate parameter as a design criterion was proposed and devel- 
oped by Wozencraft and Kennedy (1966) and by Wozencraft and Jacobs (1965). It was 
used by Jordan (1966) in the design of coded waveforms for M- ary orthogonal signals 
with coherent and noncoherent detection. Following these pioneering works, the cutoff 
rate has been widely used as a design criterion for coded signals in a variety of different 
channel conditions. 

For comprehensive study of the ideas introduced in this chapter, the reader is 
referred to standard texts on information theory including Gallager (1968) and Cover 
and Thomas (2006). 


PROBLEMS 

6.1 Prove that In u < u — 1 and also demonstrate the validity of this inequality by plotting In u 
and u — 1 on the same graph. 

6.2 X and Y are two discrete random variables with probabilities 

P(X = x,Y = y) = P{x,y) 

Show that /(A; Y) > 0, with equality if and only if X and Y are statistically independent. 
Hint: Use the inequality In u < u — 1, for 0 < u < 1, to show that — /(A; Y) < 0. 

6.3 The output of a DMS consists of the possible letters X \ , X 2 , ... ,x n , which occur with 
probabilities p\, p 2 , . . . , p „ , respectively. Prove that the entropy H(X) of the source is at 
most log;?. Find the probability density function for which H{X) = log/u 

6.4 Let A be a geometrically distributed random variable, i.e., 

P(X = k) = p(\ - pf~\ *=1,2,3,... 

1. Find the entropy of A. 

2. Given that A > K, where A is a positive integer, what is the entropy of A? 

6.5 Two binary random variables A and Y are distributed according to the joint distribu- 
tions P{X = Y = 0) = P( X = 0, Y = 1) = P( X = Y = 1) = i. Compute 
H{X),H(Y), H(X\Y), H{Y\X), and H(X, Y). 
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6.6 Let X and Y denote two jointly distributed, discrete- valued random variables. 

1. Show that 

H(X) = -J2 p (x,y)^gP(x) 

x,y 

and 

H{Y) = -J2 p (x,y)logP(v) 

*,y 

2. Use the above result to show that 

H(X, Y) < H(X) + H(Y) 

When does equality hold? 

3. Show that 

H(X\Y ) < H(X ) 

with equality if and only if X and Y are independent. 

6.7 Let Y = g(X), where g denotes a deterministic function. Show that, in general, H(Y ) < 
H(X). When does equality hold? 

6.8 Show that, for statistically independent events, 

n 

H(X 1 X 2 ---X n )=J2H(X i ) 

i=t 

6.9 Show that 

I(Xy, X 2 \Xi) = H(X 3 \X0 - H(X, \X x X 2 ) 

and that 

H{X 2 \X l )>H{X 2 \X l X 2 ) 

6.10 Let X be a random variable with PDF px(x), and let Y = aX + b be a linear transforma- 
tion of X, where a and b are two constants. Determine the differential entropy H(Y ) in 
terms of H{X). 

6.11 The outputs jci, x 2 , and x 2 of a DMS with corresponding probabilities p\ = 0.45, p 2 = 
0.35, and p 2 = 0.20 are transformed by the linear transformation Y = aX + b, where 
a and b are constants. Determine the entropy H(Y) and comment on what effect the 
transformation has had on the entropy of X. 

6.12 A Markov process is a process with one-step memory, i.e., a process such that 

P(x n \x n -i. x„- 2 ,x„-3, . . .) = p(x n \x n -x) 
for all n. Show that, for a stationary Markov process, the entropy rate is given by 


H(X n \X n -{) 
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6.13 A first-order Markov source is characterized by the state probabilities P(xj),i =1,2,.... L, 
and the transition probabilities P(x k \xi), k = 1,2, .... L, and k ^ i. The entropy of the 
Markov source is 

L 

H(X) = Y P(x k )H(X \x k ) 
k =\ 

where H(X\x k ) is the entropy conditioned on the source being in state x k . Determine the 
entropy of the binary, first-order Markov source shown in Figure P6.13, which has the 
transition probabilities P(.X2 |jci) = 0.2 and P(xi\x2) = 0.3. Note that the conditional 
entropies H{X\x\) and H(X \. X2 ) are given by the binary entropy functions Hb(P(x 2 |jci )) 
and Hb(P(xi \x 2 )), respectively. How does the entropy of the Markov source compare with 
the entropy of a binary DMS with the same output letter probabilities P(x\) and P( x 2 )2 

FIGURE P6.13 


P(X 1IJC2) 




6.14 Show that, for a DMC, the average mutual information between a sequence X \ , X 2 , . . . , X n 
of channel inputs and the corresponding channel outputs satisfies the condition 

n 

I(X iX 2 ■ ■ ■ X n ; Y t Y 2 • • • Y„) < Y f(Xr, Y>) 

1 = 1 

with equality if and only if the set of input symbols is statistically independent. 


6.15 Determine the differential entropy H(X ) of the uniformly distributed random variable X 
with PDF 


p(x) = 



0 < x <a 
otherwise 


for the following three cases: 

1. a = 1 

2. a = 4 
3 ■ a=\ 

Observe from these results that H(X) is 
measure of randomness. 


not an absolute measure, but only a relative 


6.16 A DMS has an alphabet of five letters xi, i = 1, 2, . . . , 5, each occurring with probability 
i . Evaluate the efficiency of a fixed-length binary code in which 

1. Each letter is encoded separately into a binary sequence. 

2. Two letters at a time are encoded into a binary sequence. 

3. Three letters at a time are encoded into a binary sequence. 

6.17 Determine whether there exists a binary code with codeword lengths (n\, n 2 , n 2 , nY) = 
( 1 , 2, 2, 3) that satisfy the prefix condition. 
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6.18 Consider a binary block code with 2" codewords of the same length n. Show that the Kraft 
inequality is satisfied for such a code. 

6.19 A DMS has an alphabet of eight letters x it i = 1, 2, . . . , 8, with probabilities 0.25, 0.20, 

0. 15. 0.12, 0.10, 0.08, 0.05, and 0.05. 

1. Use the Huffman encoding procedure to determine a binary code for the source output. 

2. Determine the average number R of binary digits per source letter. 

3. Determine the entropy of the source and compare it with R. 

6.20 A discrete memoryless source produces outputs to, a 2 , a 3 , 04, as, as). The corresponding 
output probabilities are 0.7, 0.1, 0.1, 0.05, 0.04, and 0.01. 

1. Design a binary Huffman code for the source. Find the average codeword length. 
Compare it to the minimum possible average codeword length. 

2. Is it possible to transmit this source reliably at a rate of 1.5 bits per source symbol? 
Why? 

3. Is it possible to transmit the source at a rate of 1.5 bits per source symbol employing 
the Huffman code designed in part 1 ? 

6.21 A discrete memoryless source is described by the alphabet X = to, x 2 , . . . , x&], and 
the corresponding probability vector p = {0.2,0.12,0.06,0.15,0.07,0.1,0.13,0.17}. 
Design a Huffman code for this source; find L, the average codeword length for the 
Huffman code; and determine the efficiency of the code defined as 


H{X) 



6.22 The optimum four-level nonuniform quantizer for a Gaussian-distributed signal amplitude 
results in the four levels a \ , 02 , a 3 , and <74, with corresponding probabilities of occurrence 
Pi = Pi = 0.3365 and p 3 = p 4 = 0.1635. 

1 . Design a Huffman code that encodes a single level at a time, and determine the average 
bit rate. 

2. Design a Huffman code that encodes two output levels at a time, and determine the 
average bit rate. 

3. What is the minimum rate obtained by encoding J output levels at a time as / — »■ 00? 

6.23 A discrete memoryless source has an alphabet of size 7, 'Bf = { x \ , X2 , X3 , X4 , X5, x§, * 7 }, 
with corresponding probabilities {0.02, 0.11, 0.07, 0.21, 0.15, 0.19, 0.25}. 

1. Determine the entropy of this source. 

2. Design a Huffman code for this source, and find the average codeword length of the 
Huffman code. 

3. A new source 3/ = {yi , y 2 , y3} is obtained by grouping the outputs of the source c^as 

yi = {xi,x 2 ,x 5 } 
yi = to, *7} 
y 3 = {x4,x 6 } 

Determine the entropy of 

4. Which source is more predictable, cFoi 3/2 Why? 
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6.24 An iid source . . . , A_ 2 , A_i, Ao, X\, A 2 , ■ ■ . has the pdf 



1 — X 


x > 0 

otherwise 


This source is quantized using the following scheme: 


'0.5 0 < A < 1 

1.5 1 < A < 2 


A = < 2.5 2 < A < 3 

3.5 3 < A < 4 


6 otherwise 


1. Design a Huffman code for the quantized source A. 

2. What is the entropy of the quantized source A? 

3. If the efficiency of the Huffman code is defined as the ratio of the entropy to the average 
codeword length of the Huffman code, determine the efficiency of the Huffman code 
designed in part 1 . 

4. Now let A = i + 0.5, i < X < i + 1, for i = 0, 1, 2, ... . Which random variable has 
a higher entropy, A or A? (There is no need to compute entropy of A, just give your 
intuitive reasoning.) 

6.25 A stationary source generates outputs at a rate of 10,000 samples. The samples are inde- 
pendent and are uniformly distributed on the interval [—4, 4]. Throughout this problem 

the distortion measure is assumed to be squared-error distortion. 

1. If perfect (distortion-free) reconstruction of the source at the destination is required, 
what is the required transmission rate from the source to the destination? 

2. If the transmission rate from the source to the destination is zero, what is the minimum 
achievable distortion? 

3. If a five-level uniform quantizer is designed for this source and the quantizer output is 
entropy-coded using a Huffman code designed for single-source outputs, what is the 
resulting transmission rate and distortion? 

4. In part 3 if the Huffman code is designed for very large blocks of source outputs rather 
than single source outputs, what is the resulting transmission rate and distortion? 

6.26 A memoryless source has the alphabet A = {—5, —3, —1,0, 1,3, 5}, with corresponding 

probabilities {0.05, 0.1, 0.1, 0.15, 0.05, 0.25, 0.3}. 

1. Find the entropy of the source. 

2. Assuming that the source is quantized according to the quantization rule 


find the entropy of the quantized source. 

6.27 Design a ternary Huffman code, using 0, 1 , and 2 as letters, for a source with output alpha- 
bet probabilities given by {0.05, 0.1, 0.15, 0.17, 0.18, 0.22, 0.13}. What is the resulting 
average codeword length? Compare the average codeword length with the entropy of the 


</(-5) = q(-3) = -4 
q(-l) = q(0) = q(l) = 0 
9(3) = <7(5) = 4 
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source. (In what base would you compute the logarithms in the expression for the entropy 
for a meaningful comparison?) 

6.28 Two discrete memoryless information sources X and Y each have an alphabet with 
six symbols, SC = 3/ = {1, 2, 3, 4, 5, 6}. The probabilities of the letters for X are 
1/2, 1/4, 1/8, 1/16, 1/32, and 1/32. The source Y has a uniform distribution. 

1. Which source is less predictable and why? 

2. Design Huffman codes for each source. Which Huffman code is more efficient? (Effi- 
ciency of a Huffman code is defined as the ratio of the source entropy to the average 
codeword length.) 

3. If Huffman codes were designed for the second extension of these sources (i.e., two 
letters at a time), for which source would you expect a performance improvement 
compared to the single-letter Huffman code and why? 

4. Now assume the two sources are independent and a new source Z is defined to be the 
sum of the two sources, i.e., Z = X + Y. Determine the entropy of this source, and 
verify that H(Z) < H(X) + H(Y). 

5. How do you justify the fact that H(Z) < H(X) + H(Y)1 Under what circumstances 
can you have H(Z) = H(X ) + ll(Y)‘! Is there a case where you can have H(Z) > 
H(X) + H(Y)1 Why? 

6.29 A function gQc) is convex on (a, b) if for any x \ , x 2 e (a, b ) and any 0 < X < 1 

g(kxi + (1 - k)x 2 ) < kg(x i) + (1 - k)g(x 2 ) 

The function g(x) is convex if its second derivative is nonnegative in the given interval. A 
function g( x) is called concave if —g(x) is convex. 

1. Show that the binary entropy function, is concave on (0, 1). 

2. Show that Q(x) is convex on (0, oo). 

3. Show that if X is a binary-valued random variable with range in (a, b) and g(X) is 
convex on (a, b), then 


4. Extend the result of part 3 to any random variable X with range in (a, b). This result is 
known as Jensen ’s inequality. 

5. Use Jensen’s inequality to prove that if X is a positive-valued random variable, then 


6.30 Find the Lempel Ziv source code for the binary source sequence 

00010010000001 10000100000001000000101000010000001 10100000001 100 

Recover the original sequence back from the Lempel Ziv source code. Hint : You require 
two passes of the binary sequence to decide on the size of the dictionary. 

6.31 A continuous-valued, discrete-time, iid (independent and identically distributed) infor- 
mation source . . . , X_ 2 , X-\, Xq. X\, X 2 . . . . has the probability density function (PDF) 
given by 


g(E[X])<E[*(X)] 


Eiemi > g(E[X]) 



x > 0 

otherwise 
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This source is quantized to source X using the following quantization rule: 


0.5 

0 < X < 1 

1.5 

1 < X < 2 

2.5 

2 < X < 3 

6 

otherwise 


1 . What is the minimum required rate for lossless transmission of the nonquantized source 
X? 

2. What is the minimum required rate for lossless transmission of the quantized source 
XI 

3. Let X be another quantization of X given by X = i + 0.25 if i < X < i + 1 for 
i = 0, 1, 2, ... . Which random variable has a higher entropy, X orl? (There is no 
need to compute entropy of X, just give your intuitive reasoning.) 

4. Let us define a new quantization rule as Y = X + X. Which of the three relations given 
below are true (if any)? 

(a) H(Y) = H(X) + H(X) 

(b) H(Y) = H{X) 

(c) H{Y) = H{X) 

Give your intuitive reason in one short paragraph; no computation is required. 


6.32 Find the differential entropy of the continuous random variable X in the following cases: 

1. X is an exponential random variable with parameter X > 0, i.e.. 


p(x) 


i e -xp 

0 


x > 0 

otherwise 


2. X is a Laplacian random variable with parameter X > 0, i.e., 

p(x ) = — e~ M/x 
2X 

3. X is a triangular random variable with parameter X > 0, i.e., 


p(x) 


(x + X)/X 
( — x + X)/X~ 
0 


— X < x < 0 
0 < x < X 

otherwise 


6.33 It can be shown that the rate distortion function for a Laplacian source p(x) = (2X) l e blA 
with an absolute value of error distortion measure d(x , x) = \x — Jc | is given by 


R(D) = 


f log (X/D) 

\o 


0 < D < X 
D > X 


(see Berger, 1971). 

1. How many bits per sample are required to represent the outputs of this source with an 
average distortion not exceeding ^X2 

2. Plot R(D) for three different values of X, and discuss the effect of changes in X on these 
plots. 


388 


Digital Communications 


6.34 Three information sources X, Y, and Z are considered. 

1. X is a binary discrete memory less source with p(X = 0) = 0.4. This source is to be 
reproduced at the receiving end with an error probability not exceeding 0.1. 

2. Y is a memoryless Gaussian source with mean 0 and variance 4. This source is to be 
reproduced with a squared-error distortion not exceeding 1.5. 

3. Z is a memoryless source and has a distribution given by 

r 1/5 — 2 < z < 0 

fz(z) = < 3/10 0<z<2 

( 0 otherwise 

This source is quantized using a uniform quantizer with eight quantization levels to 

get the quantized source Z. The quantized source is required to be transmitted with no 
errors. 

In each of the three cases, determine the absolute minimum rate required per source symbol 
(i.e., you can use systems of arbitrary complexity). 

6.35 It can be shown that if X is a zero-mean continuous random variable with variance a 1 2 , 
its rate distortion function, subject to squared-error distortion measure, satisfies the lower 
and upper bounds given by the inequalities 

1 la 2 

H(X) - - log(2jreD) < R(D) < - log — 

where H(X) denotes the differential entropy of the random variable X (see Cover and 
Thomas, 2006). 

1. Show that, for a Gaussian random variable, the lower and upper bounds coincide. 

2. Plot the lower and upper bounds for a Laplacian source with a = 1. 

3. Plot the lower and upper bounds for a triangular source with a = 1. 

6.36 A DMS has an alphabet of eight letters xi, i = 1, 2, . . . , 8, with probabilities given in 
Problem 6.19. Use the Huffman encoding procedure to determine a ternary code (using 
symbols 0, 1 , and 2) for encoding the source output. {Hint: Add a symbol xg with probability 
pg = 0, and group three symbols at a time.) 

6.37 Show that the entropy of an n -dimensional Gaussian vector X = (x\X 2 ■ ■ ■ x n ) with zero 
mean and covariance matrix C is 

H(X)= ilog(2*c)"|C| 

6.38 Evaluate the rate distortion function for an M- ary symmetric source under Hamming 
distortion (probability of error) given as 

1 - D 

R(D) = log M + D log D + (1 - D) log — - 

for M = 2, 4, 8, and 16. 

6.39 Consider the use of the weighted mean square error (MSE) distortion measure defined as 

dJX, X) = (X - X)'W(X - X) 
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where W is a symmetric, positive-definitive weighting matrix. By factorizing W as IT = 
P'P, show that d w (X, X)isequivalenttoanunweightedMSEdistortionmeasure<i 2 (A', X ) 
involving transformed vectors X' and X . 

6.40 A discrete memoryless source produces outputs [ci\, a 2 , C 13 , 04 , a^}. The corresponding 
output probabilities are 0.8, 0.1, 0.05, 0.04, and 0.01. 

1. Design a binary Huffman code for the source. Find the average codeword length. 
Compare it to the minimum possible average codeword length. 

2. Assume that we have a binary symmetric channel with crossover probability e = 0.3. 
Is it possible to transmit the source reliably over the channel? Why? 

3. Is it possible to transmit the source over the channel employing Huffman code designed 
for single source outputs? 

6.41 A discrete-time memoryless Gaussian source with mean 0 and variance a 2 is to be trans- 
mitted over a binary symmetric channel with crossover probability e. 

1. What is the minimum value of the distortion attainable at destination? (Distortion is 
measured in mean squared error.) 

2. If the channel is discrete-time memoryless additive Gaussian noise with input power P 
and noise power a 2 , what is the minimum attainable distortion? 

3. Now assume that the source has the same basic properties but is not memoryless. Do 
you expect that the distortion in transmission over the binary symmetric channel to be 
decreased or increased? Why? 

6.42 An additive white Gaussian noise channel has the output Y = X + N, where X is the 
channel input and N is the noise with probability density function 

Pin) = —7= — e - " 2/2 °" 

V Z7T G n 

If A is a white Gaussian input with E{X) = 0 and E{X 2 ) = a\, determine 

1. The conditional differential entropy H{X\ N) 

2. The mutual information I(X; Y) 


6.43 For the channel shown in Figure P6.43, find the channel capacity and the input distribution 
that achieves capacity. 



6.44 A discrete memoryless source produces outputs {«i, 02 , « 3 , 04 , a$, a 6 , < 27 , as}- The corre- 
sponding output probabilities are 0.05, 0.07, 0.08, 0.1, 0.1, 0.15, 0.2, and 0.25. 

1. Design a binary Huffman code for the source. Find the average codeword length. 
Compare it to the minimum possible average codeword length. 

2. What is the minimum channel capacity required to transmit this source reliably? Can 
this source be reliably transmitted via a binary symmetric channel? 
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3. If a discrete memoryless zero-mean Gaussian source with a 1 2 = 1 is to be transmitted 
via the channel of part 2, what is the minimum attainable mean squared distortion? 

6.45 Find the capacity of channels A and B as shown in Figure P6.45. What is the capacity of 
the cascade channel AB? ( Hint. Look carefully at the channels, avoid lengthy math.) 



Channel A 


Channel B 


FIGURE P6.45 

6.46 Each sample of a Gaussian memoryless source has a variance equal to 4, and the source 
produces 8000 samples per second. The source is to be transmitted via an additive white 
Gaussian noise channel with a bandwidth equal to 4000 Hz, and it is desirable to have a 
distortion per sample not exceeding 1 at the destination (assume squared-error distortion). 

1. What is the minimum required signal-to-noise ratio of the channel? 

2. If it is further assumed that, on the same channel, a BPSK scheme is employed with 
hard decision decoding, what will be the minimum required channel signal-to-noise 
ratio? 

Note: the signal-to-noise ratio of the channel is defined by . 

6.47 A communication channel is shown in Figure P6.47. 

a FIGURE P6.47 


B 


C 

D 



1 . Show that, regardless of the contents of the probability transition matrix of the channel, 
we have 

C < log 2 3 1.585. bits per transmission 

2. Determine one probability transition matrix under which the above upper bound is 
achieved. 
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3. Assuming that a Gaussian source with variance a 1 2 3 = 1 is to be transmitted via the 
channel in part 2, what is the minimum achievable distortion? (Mean squared distortion 
is assumed throughout.) 

6.48 X is a binary memoryless source with P{X = 0) = 0.3. This source is transmitted over a 
binary symmetric channel with crossover probability p = 0.1. 

1 . Assume that the source is directly connected to the channel; i.e., no coding is employed. 
What is the error probability at the destination? 

2. If coding is allowed, what is the minimum possible error probability in the reconstruc- 
tion of the source? 

3. For what values of p is reliable transmission possible (with coding, of course)? 

6.49 Two discrete memoryless information sources 3) and So each have an alphabet with six 
symbols. Si = {xi, Xi, . . . , x$} and St = {yi, yi, ■ ■ . , ye}- The probabilities of the letters 
for the first source are 1/2, 1/4, 1/8, 1/16, 1/32, and 1/32. The second source has a 
uniform distribution. 

1. Which source is less predictable and why? 

2. Design Huffman codes for each source. Which Huffman code is more efficient? 
(Efficiency of a Huffman code is defined as the ratio of the source entropy to the 
average codeword length.) 

3. If Huffman codes were designed for the second extension of these sources (i.e., two 
letters at a time), for which source would you expect a performance improvement 
compared to the single-letter Huffman code and why? 

6.50 Show that the capacity of a binary-input, continuous-output AWGN channel with input- 
output relation 

y; = Xi + rij 

where x,- = ±A and noise components are iid zero-mean Gaussian random variables 
with variance a 2 as given by Equations 6.5-31 and 6.5-32. 

6.51 A discrete memoryless channel is shown in Figure P6.51. 

i FIGURE P6.51 


2 



1. Determine the capacity of this channel. 

2. Determine Rq for this channel. 

3. If a discrete-time memoryless Gaussian source with a variance of 4 is to be transmitted 
by this channel, and for each source output, two uses of channel are allowed, what is 
the absolute minimum to the achievable squared-error distortion? 
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6.52 Show that the following two relations are necessary and sufficient conditions for the set of 
input probabilities [P(Xj)} to maximize I(X\ Y ) and, thus, to achieve capacity for a DMC: 

I(Xj; Y) = C for all j with P(xj) > 0 

I(Xj\ Y) < C for all j with P{ Xj ) = 0 

where C is the capacity of the channel, Q = \ |, and 


v— >, P(yi\Xj) 

i( X j\ y) = J2 P(yi\xj)iog 

i = 0 

6.53 Figure P6.53 illustrates a M-ary symmetric DMC with transition probabilities P(y \x) = 
1 — p when x = y = k for k = 0, 1 M — 1, and P(y|x) = p/(M — 1) when x ^ y. 

1. Show that this channel satisfies the condition given in Problem 6.52 when P(Xk) = 
1/M. 

2. Determine and plot the channel capacity as a function of p. 

input Output FIGURE P6.53 

X i -p r 



6.54 Determine the capacities of the channels shown in Figure P6.54. 





6.55 Consider the two channels with the transition probabilities as shown in Figure P6.55. 
Determine if equally probable input symbols maximize the information rate through the 
channel. 
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6.56 A telephone channel has a bandwidth W = 3000 Hz and a signal-to-noise power ratio of 
400 (26 dB). Suppose we characterize the channel as a band-limited AWGN waveform 
channel with P^/WNq = 400. Determine the capacity of the channel in bits per second. 


6.57 Consider the binary-input, quaternary-output DMC shown in Figure P6.57. 

1. Determine the capacity of the channel. 

2. Show that this channel is equivalent to a BSC. 



FIGURE P6.57 


6.58 Determine the capacity for the channel shown in Figure P6.58. 



6.59 Consider a BSC with crossover probability of p. Suppose that R is the number of bits in 
a source codeword that represents one of 2 R possible levels at the output of a quantizer. 

1. Determine the probability that a codeword transmitted over the BSC is received 
correctly. 

2. Determine the probability of having at least one bit error in a codeword transmitted 
over the BSC. 

3. Determine the probability of having n e or fewer bit errors in a codeword. 

4. Evaluate the probabilities in parts 1, 2, and 3 for R = 5, p = 0.1, and n e = 5. 
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6.60 Figure P6.60 illustrates a binary erasure channel with transition probabilities P(0|0) = 
P(l|l) = 1 — p and P(e|0) = P(e 1 1) = p. The probabilities for the input symbols are 
P(X = 0) = a and P(X = 1) = 1 - a. 

1. Determine the average mutual information I(X; Y ) in bits. 

2. Determine the value of a that maximizes I(X; Y), i.e., the channel capacity C in bits 
per channel use, and plot C as a function of p for the optimum value of a. 

3. For the value of a found in part 2, determine the mutual information I(x\ y) = 
1(0; 0), 7(1; 1), 7(0; e), and 7(1; e), where 


/(■*; y) 


P[X = x ,Y = y] 

log 

P[X = x]P[7 = y] 



FIGURE P6.60 


6.61 A discrete-time zero-mean Gaussian random process has a variance per sample of a\ . This 
source generates outputs at a rate of 1000 per second. The samples are transmitted over 
a discrete-time AWGN channel with input power constraint of P and noise variance per 
sample of a\. This channel is capable of transmitting 500 symbols per second. 

1 . If the source is to be transmitted over the channel, you are allowed to employ processing 
schemes of any degree of complexity, and any delay is acceptable, what is the minimum 
achievable distortion per sample? 

2. If the channel remains the same but you have to use binary antipodal signals at the 
input and employ hard decision decoding at the output (again no limit on complexity 
and delay), what is the minimum achievable distortion per sample? 

3. Now assume that the source has the same statistics but is not memoryless. Comparing 
with part 1, do you expect the distortion to decrease or increase? Give your answer in 
a short paragraph. 

6.62 A binary memoryless source generates 0 and 1 with probabilities 1 /3 and 2/3, respectively. 
This source is to be transmitted over an AWGN channel using binary PSK modulation. 

1 . What is the absolute minimum Eb /No required to be able to transmit the source reliably, 
assuming that hard decision decoding is employed by the channel and for each source 
output you can use one channel transmission. 

2. Under the same conditions as in part 1, find the minimum Eb/No required for reliable 
transmission of the source if we can transmit at a rate at most equal to the cutoff rate 
of the channel. 

3. Now assume the source is a zero-mean memoryless Gaussian source with variance 1. 
Answer part 1 if our goal is reproduction of the source with a mean-squared distortion 
of at most 1 /4. 

6.63 A discrete memoryless source U is to be transmitted over a memoryless communication 
channel. For each source output, the channel can be used only once. Determine the min- 
imum theoretical distortion achievable in transmission of the source over the channel in 
each of the following cases. 
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1. The source is a binary source with 0 and 1 as its outputs with p(U = 0) = 0.1; the 
channel is a binary symmetric channel with crossover probability e = 0.1; and the 
distortion measure is the Hamming distortion (probability of error). 

2. The channel is as in part 1 , but the source is a zero-mean Gaussian source with variance 
1 . The distortion is the squared-error distortion. 

3. The source is as in part 2, and the channel is a discrete-time AWGN channel with input 
power constraint P and noise variance a 2 . 

6.64 Channel C\ is an additive white Gaussian noise channel with a bandwidth W, average 
transmitter power P , and noise power spectral density ^Nq. Channel C 2 is an additive 
Gaussian noise channel with the same bandwidth and power as channel C 1 but with noise 
power spectral density <S„(/). It is further assumed that the total noise power for both 
channels is the same; i.e., 

/ w ,-w 1 

S„(f)df= / -No elf = NqW 

w J-w 1 

Which channel do you think has a larger capacity? Give an intuitive reasoning. 


6.65 A discrete memoryless ternary erasure communication channel is shown in Figure P6.65. 

I FIGURE P6.65 


2 



1. Determine the capacity of this channel. 

2. A memoryless exponential source X with probability density function 


fx(x) 


j 2e~ 2x x>0 

\ 0 otherwise 


is quantized using a two-level quantizer defined by 


X = q{X) 


fO X <2 

[ 1 otherwise 


Can X be reliably transmitted over the channel shown above? Why? (The number of 
source symbols per second is equal to the number of channel symbols per second.) 


6.66 Plot the capacity of an AWGN channel that employs binary antipodal signaling, with 
optimal bit-by-bit detection at the receiver, as a function of £b/No. On the same axis, plot 
the capacity of the same channel when binary orthogonal signaling is employed. 

6.67 A discrete-time memoryless Gaussian source with mean 0 and variance a 2 is to be trans- 
mitted over a binary symmetric channel with crossover probability p. 
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1. What is the minimum value of the distortion attainable at the destination (distortion is 
measured in mean-squared error)? 

2. If the channel is a discrete-time memory less additive Gaussian noise channel with input 
power P and noise power P n , what is the minimum attainable distortion? 

3. Now assume that the source has the same basic properties but is not memoryless. Do you 
expect the distortion in transmission over the binary symmetric channel to be decreased 
or increased? Why? 

6.68 Find the capacity of the cascade connection of n binary symmetric channels with the same 
crossover probability 6. What is the capacity when the number of channels goes to infinity? 


6.69 Channels 1, 2, and 3 are shown in Figure P6.69. 

1 . Find the capacity of channel 1 . What input distribution achieves capacity? 

2. Find the capacity of channel 2. What input distribution achieves capacity? 

3. Let C denote the capacity of the third channel and C\ and C 2 represent the 
capacities of the first and second channels. Which of the following relations holds 
true and why? 


c < i(C, + c 2 ) 
c = i(C! + Cl) 
c > i(Ci + Cl) 



Channel 1 Channel 2 



FIGURE P6.69 

6.70 Let C denote the capacity of a discrete memoryless channel with input alphabet 

[x\, Xi, . . . , Xn) and output alphabet $/ = { vi , yi, . . . , Jm}- Show that C < minjlog M, 
log A?}. 

6.71 The channel C (known as the Z channel) is shown in Figure P6.71. 

1 . Find the input probability distribution that achieves capacity. 

2. What is the input distribution and capacity for the special cases 6 = 0, e = 1, and 
6 = 0.57? 

o o FIGURE P6.71 

t - 


l 
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3. Show that if n such channels are cascaded, the resulting channel will be equivalent to 
a Z channel with e\ = e n . 

4. What is the capacity of the equivalent Z channel when n — > oo? 

6.72 Find the capacity of an additive white Gaussian noise channel with a bandwidth 1 MHz, 
power 10W, and noise power spectral density \Nq = 10 -9 W/Hz. 

6.73 A Gaussian memoryless source is distributed according to AT(0, 1). This source is to be 
transmitted over a binary symmetric channel with a crossover probability of e = 0.1. For 
each source output one use of channel is possible. The fidelity measure is squared-error 
distortion, i.e., d{x, x) = (x — x) 2 . 

1. In the first approach we use the optimum one-dimensional (scalar) quantizer. This 
results in the following quantization rule 


where x = 0.798 and the resulting distortion is 0.3634. Then x and — x are represented 
by 0 and 1 and directly transmitted over the channel (no channel coding). Determine 
the resulting overall distortion using this approach. 

2. In the second approach we use the same quantizer used in part 1 , but we allow the use of 
arbitrarily complex channel coding. How would you determine the resulting distortion 
in this case, and why? 

3. Now assume that after quantization, an arbitrarily complex lossless compression scheme 
is employed and the output is transmitted over the channel (again using channel coding, 
as explained in part 2). How would the resulting distortion compare with part 2? 

4. If you were allowed to use an arbitrarily complex source and channel coding scheme, 
what would be the minimum achievable distortion? 

5. If the source is Gaussian with the same per-letter statistics (i.e., each letter is JV(0, 1)) 
but the source has memory (for instance, a Gauss-Markov source), do you think the 
distortion you derived in part 4 would increase, decrease, or not change? Why? 

6.74 For the channel shown in Figure P6.65: 

1. Consider an extension of the channel with inputs a\, ai, . . . , a n , outputs ai, 02 , , 
a„, E, where F , (a i |a,-) = P{E\a t ) = for all 1 < i < n, and all other transition 
probabilities are zero. What is the capacity of this channel? What is the capacity when 


2. If a memoryless binary equiprobable source is transmitted via the channel shown in 
Figure P6.65, what is the minimum attainable error probability, assuming no limit is 
imposed on the complexity and delay of the system? (The number of source symbols 
per second is equal to the number of channel symbols per second.) For what values of 
n in part 2 can the source be reliably transmitted over the channel? 

3. If a Gaussian source distributed according to Af(m, o 2 ) is transmitted via the channel in 
part 2, what is the minimum attainable mean-squared distortion in regeneration of this 
source as a function of n and er 2 ? (Again the number of source symbols per second is 
equal to the number of channel symbols per second, and no limit is imposed on system 
complexity and delay.) 



x > 0 

jc < 0 


6.75 Using the expression for the cutoff Ro for the BSC, given in Equation 6.8-29, plot Rq as 
a function of £ c /Nq for the following binary modulation methods: 
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1. Antipodal signaling: p = Q 

2. Orthogonal signaling: p = Q ( 

3. DPSK: p = l e - £ '/ N ° 

Comment on the difference in performance for the three modulation methods, as given by 
the cutoff rate. 

6.76 Consider the binary-input, ternary-output channel with transition probabilities shown in 
Figure P6.76, where e denotes an erasure. For the AWGN channel, a and p are defined as 


«= ‘ /VWS)*/** 


1 - p - a 



FIGURE P6.76 


1. Determine the cutoff rate Rq as a function of the probabilities a and p. 

2. The cutoff rate Rq depends on the choice of the threshold /J through the probabilities a 
and p. For any £ c /Nq, the value of /J that maximizes Rq can be determined by trial and 

error. For example, it can be shown that for £ c /Nq below 0 dB, /3 opt = 0.65 \Nq\ for 

1 < £ c /Nq < 10, /3 opt varies approximately linearly between 0.65 y^ \Nq and \Nq. 

By using /S = 0.65 y^ \Nq for the entire range of £ c /Nq , plot Rq versus £ c /Nq and 
compare this result with Rq for an unquantized (continuous) output channel. 

6.77 Show that for M-ary PSK signaling the cutoff rate Rq is given by 
Ro = log 2 M - log 2 


= log, M - log. 

Plot Rq as a function of £ c /Nq for M = 2, 4, 8, and 16. 


y^ e -|bo-Sill 2 /4iVo 

k=0 


M - 1 

e -(£c/No)sin 2 (7ik/M ) 

k=0 


6.78 A discrete-time additive non-Gaussian noise channel is described by the input-output 
relation 


y ; = Xi + n t 
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where n, represents a sequence of iid noise random variables with probability density 
function 


and Xj can take ± 1 with equal probability, where i represents the time index. 

1. Determine the cutoff rate Rq for this channel. 

2. Assume that this channel is used with optimal hard decision decoding at the output. 
What is the crossover probability of the resulting BSC channel? 

3. What is the cutoff rate in part 2? 

6.79 Show that the cutoff rate for an M-ary orthogonal signaling system where each signal 
has energy E and the channel is AWGN with noise power spectral density of \ Nq can be 
expressed as 


where p n (■) represents the PDF of an JV(0, | No) random variable. Conclude that the above 
expression is simplified as 


p(n) = w 



M 

R ° ~ l0g2 _ 1 + (M — l) e - £ / N ° 
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W e have studied the performance of different signaling methods when transmitted 
through an AWGN channel in Chapter 4. In particular we have seen how the error 
probability of each signaling method is related to the SNR per bit. In that chapter 
we were mainly concerned with the case where M possible messages are sent by 
transmitting one of the M possible waveforms, rather than blocks of channel inputs. 
We also introduced criteria for comparing power and bandwidth efficiency of different 
signaling schemes. The power efficiency is usually measured in terms of the required 
SNR per bit to achieve a certain error probability. The lower the required SNR per 
bit, the more power-efficient the system is. The bandwidth efficiency of the system is 
measured by the spectral bit rate r = R/W which determines how many bits per second 
can be transmitted in 1 Hz of bandwidth. Systems with high spectral bit rate are highly 
bandwidth-efficient systems. We also saw that there is a trade-off between bandwidth 
and power efficiency. Modulation schemes such as QAM are highly bandwidth-efficient, 
and signaling schemes such as orthogonal signaling are power-efficient at the expense 
of high bandwidth demand. 

In Chapter 6 we saw that reliable communication over a noisy channel is possible 
if the transmission rate is less than channel capacity. Reliable communication is made 
possible tli rough channel coding, i.e., assigning messages to blocks of channel inputs 
and using only a subset of all possible blocks. In Chapter 6 we did not study specific 
mappings between messages and channel input sequences. Both channel capacity C 
and channel cutoff rate Rq were presented using random coding. In random coding 
we do not find the best mapping from the message set to channel input sequences and 
analyze the performance of that mapping; rather we average the error probability over 
all possible mappings and show that if the transmission rate is less than channel capacity, 
the ensemble average of the error probability, averaged over all possible mappings, goes 
to zero as the block length increases. From this we concluded that there must exist at 
least one mapping among all mappings for which the error probability goes to zero as 
the block length increases. The original proof of the channel coding theorem, presented 
by Shannon in 1948, was based on random coding, and hence was not constructive in the 
sense that it proved only the existence of good codes but did not provide any method for 
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their design. Of course, based on the idea of random coding, one can argue that there is 
a good chance that a randomly generated code is a good code. The problem, however, is 
that the decoding of a randomly generated code when the codeword sequences are long 
becomes extremely complex, thus making its use in practical systems impossible. The 
development of coding theory in the decades after 1948 has been focused on designing 
coding schemes that have sufficient structure to make their decoding practical and at 
the same time close the gap between an uncoded system and the bounds derived by 
Shannon. In Chapter 6 we also derived a fundamental relation between r, the spectral 
bit rate, and the SNR per bit of an ideal communication system given by 

Eb 2 r -l 

No > r 

By comparing the bandwidth and power efficiency of a given system with the bound 
given in this equation, we can see how much that system can be improved. 

Our focus in this chapter and Chapter 8 is on channel coding schemes with man- 
ageable decoding algorithms that are used to improve performance of communication 
systems over noisy channels. This chapter is devoted to block codes whose construction 
is based on familiar algebraic structures such as groups, rings, and fields. In Chapter 8 
we will study coding schemes that are best represented in terms of graphs and trellises. 


■ 7.1 

BASIC DEFINITIONS 

Channel codes can be classified into two major classes, block codes and convolutional 
codes. In block codes one of the M = 2 k messages, each representing a binary sequence 
of length k, called the information sequence, is mapped to a binary sequence of length 
n, called the codeword, where n > k. The codeword is usually transmitted over the 
communication channel by sending a sequence of n binary symbols, for instance, 
by using BPSK. QPSK and BFSK are other types of signaling schemes frequently 
used for transmission of a codeword. Block coding schemes are memoryless. After a 
codeword is encoded and transmitted, the system receives a new set of k information 
bits and encodes them using the mapping defined by the coding scheme. The resulting 
codeword depends only on the current k information bits and is independent of all the 
codewords transmitted before. 

Convolutional codes are described in terms of finite-state machines. In these codes, 
at each time instance i, k information bits enter the encoder, causing n binary symbols 
generated at the encoder output and changing the state of the encoder from er,_i to or,-. 
The set of possible states is finite and denoted by E . The n binary symbols generated 
at the encoder output and the next state er, depend on the k input bits as well as <r ( _| . 
We can represent a convolutional code by a shift register of length Kk as shown in 
Figure 7.1-1. 

At each time instance, k bits enter the encoder and the contents of the shift register 
are shifted to the right by k memory elements. The contents of the rightmost k elements 
of the shift register leave the encoder. After the k bits have entered the shift register, 
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FIGURE 7.1-1 

A convolutional encoder. 


the n adders add the contents of the memory elements they are connected to (modulo-2 
addition) thus generating the code sequence of length n which is sent to the modulator. 
The state of this convolutional code is given by the contents of the first ( K — I )k 
elements of the shift register. 

The code rate of a block or convolutional code is denoted by R c and is given by 

k 

R c = ~ (7.1-1) 

n 


The rate of a code represents the number of information bits sent in transmission of a 
binary symbol over the channel. The unit of R c is information bits per transmission. 
Since generally n > k, we have R, < 1. 

Let us assume that a codeword of length n is transmitted using an N -dimensional 
constellation of size M, where M is assumed to be a power of 2 and L = lo J' M is 
assumed to be an integer representing the number of M - ary symbol transmitted per 
codeword. If the symbol duration is T s , then the transmission time for k bits is T = L T s 
and the transmission rate is given by 




log 2 M 
T s 


, lQg2 M 

C T s 


bits / s 


(7.1-2) 


The dimension of the space of the encoded and modulated signals is LN, and using 
the dimensionality theorem as stated in Equation 4.6-5 we conclude that the minimum 
required transmission bandwidth is given by 



RN 

2 R c log 2 M 


bits/s 


(7.1-3) 


and from Equation 7.1-3, the resulting spectral bit rate is given by 

R 2 log 2 M 


W 


N 


Rc 


(7.1-4) 
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These equations indicate that compared with an uncoded system that uses the same 
modulation scheme, the bit rate is changed by a factor of R c and the bandwidth is 
changed by a factor of 1/ R c , i.e., there is a decrease in rate and an increase in bandwidth. 

If the average energy of the constellation is denoted by £. dv , then the energy per 
codeword £ , is given by 


£ = L£ dV = 


-f 

, t-'a’ 


log 2 M 

and £ c , energy per component of the codeword, is given by 

£ £ 

t-' <-'av 

O c == — — 


n log 2 M 

The energy per transmitted bit is denoted by £& and can be found from 

£ £ 

Ls t-'av 

Oh = — = 


k R ( log 2 M 
From Equations 7.1-6 and 7.1-7 we conclude that 

£ c = Rc£ b 


( 7 . 1 - 5 ) 


(7.1-6) 


( 7 . 1 - 7 ) 


(7.1-8) 


The transmitted power is given by 

£ 


£ £ 

p = ~ = — = p Cav 

LT S T s R c log 2 M 


= R£ h 


(7.1-9) 


Modulation schemes frequently used with coding are BPSK, BFSK, and QPSK. The 
minimum required bandwidth and the resulting spectral bit rates for these modulation 
schemes^ are given below: 


BPSK : 



BFSK : 



QPSK : 



(7.1-10) 


7.1-1 The Structure of Finite Fields 

To further explore properties of block codes, we need to introduce the notion of a finite 
field and its main properties. Simply stated, a field is a collection of objects that can 
be added, subtracted, multiplied, and divided. To define fields, we begin by defining 
Abelian groups. An Abelian group is a set with a binary operation that has the basic 
properties of addition. A set G and a binary operation denoted by + constitute an 
Abelian group if the following properties hold: 

1. The operation + is commutative; i.e., for any a,beG,a+b = b+a. 

2. The operation + is associative; i.e., for any a,b,c e G, we have (a + b) + c = 
a + (b + c). 


tBPSK is assumed to be transmitted as a double-sideband signal. 
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■ TABLE 7.1-1 

Addition and Multiplication Tables for GF(2) 



3. The operation + has an identity element denoted by 0 such that for any a e G, 
fl + 0 = 0 + n = a. 

4. For any a e G there exists an element —a e G such that a + (—a) = {—a) + a = 0. 
The element —a is called the (additive) inverse of a. 

An Abelian group is usually denoted by {G, +, 0}. 

A finite field or Galois field 1 * is a finite set F with two binary operations, addition and 
multiplication, denoted, respectively, by + and ■, satisfying the following properties: 

1. {F, +, 0} is an Abelian group. 

2. {F — {0}, •, 1} is an Abelian group; i.e., the nonzero elements of the field constitute 
an Abelian group under multiplication with an identity element denoted by “1”. The 
multiplicative inverse of a e F is denoted by a -1 . 

3. Multiplication is distributive with respect to addition: a ■ (b + c) = (I) + c) ■ a = 
a ■ b + a ■ c. 

Afield is usually denoted by {F, +, •}. It is clear that M, the set of real numbers, is afield 
(but not a finite field) with ordinary addition and multiplication. The set F = {0, 1} 
with modulo-2 addition and multiplication is an example of a Galois (finite) field. This 
field is called the binary field and is denoted by GF(2). The addition and multiplication 
tables for this field are given in Table 7.1-1. 


Characteristic of a Field and the Ground Field 

A fundamental theorem of algebra states that a Galois field with q elements, denoted 
by GF(c/), exists if and only if q = p m , where p is a prime and m is a positive integer. 
It can also be proved that when GFri/) exists, it is unique up to isomorphism. This 
means that any two Galois fields of the same size can be obtained from each other 
after renaming the elements. For the case of q = p, the Galois field can be denoted by 
GF(p) = {0, 1, 2, . . . , p — 1} with modulo-p addition and multiplication. For instance 
GF(5) = {0, 1,2, 3,4} is a finite field with modulo-5 addition and multiplication. 
When q = p m , the resulting Galois field is called an extension field of GF( p). In this 
case GF(/>) is called the ground field of GF( p m ), and p is called the characteristic 
of GF( p m ). 


tNamed after French mathematician Evariste Galois (181 1-1832). 
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Polynomials over Finite Fields 

To study the structure of extension fields, we need to define polynomials over GF(p). 
A polynomial of degree m over GF( p) is a polynomial 

g(.X) = §o + giX + g 2 X 2 + • • • + g m X m (7.1-1 1) 

where g,,0 < i < m , are elements of GF( p) and g m ^ 0. Addition and multiplication of 
polynomials follow standard addition and multiplication rules of ordinary polynomials 
except that addition and multiplication of the coefficients are done modulo- p. If g m = 1 , 
the polynomial is called monic. If a polynomial of degree m over GF( p) cannot be 
written as the product of two polynomials of lower degrees over the same Galois field, 
then the polynomial is called an irreducible polynomial. For instance, X 2 + X + 1 is 
an irreducible polynomial over GF(2), whereas X 2 + 1 is not irreducible over GF(2) 
because X 2 + 1 = (X + l) 2 . A polynomial that is both monic and irreducible is called 
a prime polynomial. A fundamental result of algebra states that a polynomial of degree 
m over GF(p) has m roots (some may be repeated), but the roots are not necessarily in 
GF(p). In general, the roots are in some extension field of GF( p). 

The Structure of Extension Fields 

From the above definitions it is clear that there exist p m polynomials of degree less 
than m; in particular these polynomials include two special polynomials g(X) = 0 and 
g(X ) = 1. Now let us assume that g(X) is a prime (monic and irreducible) polynomial 
of degree m and consider the set of all polynomials of degree less than m over GF( p) 
with ordinary addition and with polynomial multiplication modulo-g(A). It can be 
shown that the set of these polynomials with the addition and multiplication operations 
defined above is a Galois field with p m elements. 

example 7.1-1. We know that X 2 + X + 1 is prime over GF(2); therefore this poly- 
nomial can be used to construct GF(2 2 ) = GF(4). Let us consider all polynomials of 
degree less than 2 over GF(2). These polynomials are 0, I , X, and X + 1 with addition 
and multiplication tables given in Table 7. 1-2. Note that the multiplication rule basically 
entails multiplying the two polynomials, dividing the product by g(X) = X 2 + X + 1, 
and finding the remainder. This is what is meant by multiplying modulo-g(X). It is 
interesting to note that all nonzero elements of GF(4) can be written as powers of X ; 
i.e, X = X 1 , X + l = X 2 , and 1 = X 3 . 


■ TABLE 7.1-2 

Addition and Multiplication Table for GF(4) 


+ 

0 

1 

X 

X+l 

0 

0 

1 

X 

X+l 

1 

1 

0 

X+l 

X 

X 

X 

X+l 

0 

1 

X+l 

X+l 

X 

1 

0 



0 

1 

X 

X+l 

0 

0 

0 

0 

0 

1 

0 

1 

X 

X+l 

X 

0 

X 

X+l 

1 

X+l 

0 

X+l 

1 

X 
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■ TABLE 7.1-3 

Multiplication Table for GF(8) 



0 

1 

X 

X+l 

X 2 

X 2 + 1 

X 2 + X 

X 2 + X + 1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

X 

X+l 

X 2 

X 2 + 1 

X 2 + X 

X 2 + X + 1 

X 

0 

X 

X 2 

x 2 + x 

X + l 

1 

X 2 + X + 1 

X 2 + 1 

X + l 

0 

X+l 

X 2 + x 

X 2 + 1 

X 2 + X + 1 

X 2 

1 

X 

X 2 

0 

X 2 

X + l 

X 2 + X + 1 

X 2 + x 

X 

X 2 + 1 

1 

X 2 + 1 

0 

X 2 + 1 

1 

X 2 

X 

X + 2 + X+ 1 

X+l 

X 2 + x 

x 2 + x 

0 

X 2 + X 

X 2 + X + 1 

1 

X 2 + 1 

X+l 

X 

X 2 

X 2 + X + 1 

0 

X 2 + X + 1 

X 2 + 1 

X 

1 

x 2 + x 

X 2 

X + l 


example 7 . 1 - 2 . To generate GF(2 3 ), we can use either of the two prime polynomials 
gi(X) = X 3 + X + 1 or g 2 (X) = X 3 + X 2 + 1. If g(X) = X 3 + X + 1 is used, 
the multiplication table for GF(2 3 ) is given by Table 7.1-3. The addition table has 
a trivial structure. Flere again note that X 1 = X, X 2 = X 2 , X 3 = X + 1, X 4 = 
X 2 + X, X 5 = X 2 + X + 1, X 6 = X 2 + 1, and X 1 = 1. In other words, all nonzero 
elements of GF(8) can be written as powers of X. The nonzero elements of the field 
can be expressed either as polynomials of degree less than 3 or, equivalently, as X' for 
1 < i < 7. A third method for representing the field elements is to write coefficients 
of the polynomial as a vector of length 3. The representation of the form X' is the 
appropriate representation when multiplying field elements since X 1 ■ X 1 — X' + \ 
where i + j should be reduced modulo-7 because X 1 = 1. The polynomial and vector 
representations of field elements are more appropriate when adding field elements. A 
table of the three representations of field elements is given in Table 7. 1-4. For instance, 
to multiply X 2 + X + 1 and X 2 + 1, we use their power representation as X 5 and X 6 
and we have (X 2 + X + 1)(X 2 + 1) = X 11 = X 4 — X 2 + X. 


TABLE 7.1-4 

Three Representations for GF(8) Elements 


Power 

Polynomial 

Vector 

— 

0 

000 

X° = X 7 

1 

001 

X 1 

X 

010 

X 2 

X 2 

100 

X 3 

X + l 

Oil 

X 4 

X 2 + x 

110 

X 5 

X 2 + X + 1 

111 

X 6 

X 2 + 1 

101 
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Primitive Elements and Primitive Polynomials 

For any nonzero element /3 e GF(g), the smallest value of i such that /F = 1 is called the 
order of fi. It is shown in Problem 7.1 that for any nonzero fi e GFfc/) wc have f} q 1 = 1; 
therefore the order of fi is at most equal to q — 1. A nonzero element of GFfr/) is 
called a primitive element if its order is q — 1 . We observe that in both Examples 7.1-1 
and 7.1-2, X is a primitive element. Primitive elements have the property that their 
powers generate all nonzero elements of the Galois field. Primitive elements are not 
unique; for instance, the reader can verify that in the GF(8) of Example 7.1-2, X 2 and 
X + 1 are both primitive elements; however, 1 e GF(8) is not primitive since 1 1 = 1. 

Since there are many prime polynomials of degree m, there are many constructs of 
GF (p m ) which are all isomorphic; i.e., each can be obtained from another by renaming 
the elements. It is desirable that X be a primitive element of the Galois field GF( /?'"), 
since in this case all nonzero elements of the field can be expressed simply as powers of X 
as was shown in Table 7.1-4 for GF(8). If GFf p"' ), generated by g(X), is such that in this 
field X is a primitive element, then the polynomial g(X) is called a primitive polynomial. 
It can be shown that primitive polynomials exist for any degree nr, and therefore, for 
any positive integer m and any prime p, it is possible to generate GF (p m ) such that in 
this field X is primitive, i.e., all nonzero elements can be written as X ' , 0 < i < p m — 1. 
We always assume that Galois fields are constructed using primitive polynomials. 

example 7.1-3. Polynomials gi(X) = X 4 + X+l andg 2 (W) = X 4 + X 2 + X 2 + X+l 
are two prime polynomials of degree 4 over GF(2) that can be used to generate GF(2 4 ). 
However, in the Galois field generated by g\ (X), X is a primitive element, hence gi(X) 
is a primitive polynomial, but in the held generated by g 2 <W), X is not primitive; in fact 
in this held X 5 = 1 since X 5 + 1 = (X + 1 )gi(X ). Therefore, g 2 (X) is not a primitive 
polynomial. 

It can be shown that any prime polynomial g(X) of degree m over GFf p) divides 
X pm ~ l + 1. However, it is possible that g(X) divides X' + 1 for some i < p"‘ — 1 as 
well. For instance, X 4 + X 3 + X 2 + X + 1 divides X 15 + 1, but it also divides X 5 + 1. It 
can be shown that if a prime polynomial g(X) has the property that the smallest integer 
i for which g(X) divides X‘ + 1 is i = p m — 1, then g(X) is primitive. This means that 
we have two equivalent dehnitions for a primitive polynomial. The hrst dehnition states 
that a primitive polynomial g(X ) is a prime polynomial of degree in such that if GFf p"‘) 
is constructed based on g(X). in the resulting held X is a primitive element. The second 
dehnition states that g(X), a prime polynomial of degree m, is primitive if g(X) does 
not divide X' + 1 for any i < p m — 1 . All roots of a primitive polynomial of degree 
m are primitive elements of GFf p m ). Primitive polynomials are usually tabulated for 
different values of m. Table 7.1-5 gives some primitive polynomials for 2 < m < 12. 

example 7.1-4. GF(16) can be constructed using g(X) = X 4 + X + 1. If a is a root 
of g(X), then a is a primitive element of GF(16) and all nonzero elements of GF(16) 
can be written as a' for 0 < i < 15 with a 15 — a 0 = 1 . Table 7.1-6 presents elements 
of GF(16) as powers of a, as polynomials in a, and finally as binary vectors of length 
4. Note that fi = a 3 is a nonprimitive element in this held since (X = a 15 = 1; i.e., 
the order of /3 is 5. It is clearly seen that a 6 , a 12 , and a 9 are also elements of order 5, 
whereas a 5 and a 10 are elements of order 3. Primitive elements of this held are a, a 2 , 
a 4 , a 8 , a 7 , a 14 , a 13 , and a 11 . 
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TABLE 7.1-5 

Primitive Polynomials of Orders 2 through 12 


m 

g(X) 

2 

X 2 + X + 1 

3 

X 3 + X + 1 

4 

X 4 + X + 1 

5 

X 5 + X 2 + 1 

6 

X 6 + X + 1 

7 

X 7 + X 3 + 1 

8 

X 8 + X 4 + X 3 + X 2 + 1 

9 

X 9 + X 4 + 1 

10 

X 10 + X 3 + 1 

11 

X 11 +X 2 + 1 

12 

X 12 + X 6 + X 4 + X + 1 


Minimal Polynomials and Conjugate Elements 

The minimal polynomial of a field element is the lowest-degree monic polynomial over 
the ground held that has the element as its root. Let be a nonzero element of GF(2'"). 
Then the minimal polynomial of ( J >, denoted by cj)p{X), is a monic polynomial of lowest 
degree with coefficients in GF(2) such that ft is a root of <j>p{X), i.e., ) = 0. 

Obviously <pp(X) is a prime polynomial over GF(2) and divides any other polynomial 
over GF(2) that has a root at /?; i.e., if f(X) is any polynomial over GF(2) such that 

■ TABLE 7.1-6 

Elements of GF(16) 


Power 

Polynomial 

Vector 

— 

0 

0000 

a 0 = a 15 

1 

0001 

a 1 

a 

0010 

a 2 

a 2 

0100 

or* 

a 3 

1000 

O' 4 

a + 1 

0011 

O' 5 

a 2 + a 

0110 

a 6 

a 3 + a 2 

1100 

a 1 

a 3 + a + 1 

1011 

a 8 

a 2 + 1 

0101 

a 9 

a 3 + a 

1010 

a 10 

a 2 + a + 1 

0111 

a 11 

a 3 + a 2 + a 

1110 

a 12 

a 3 + a 2 + a + \ 

mi 

a 13 

a 3 + a 2 + 1 

1101 

a 14 

a 3 + 1 

1001 
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f{ft) = 0, then f{X) can be factorized as f{X) = a{X)cj> l b(X). In the following 
paragraph we see how to obtain the minimal polynomial of a field element. 

Since ft £ GF(2 m ) and ft y 4 0, we know that ft 2 " 1 = 1. However, it is possible 
that for some integer i < m we have ft 2 -1 = 1. For instance, in GF(16) if ft = a 5 , 
then ft 2 = ft 2 '- 1 = 1; therefore for this ft we have i = 2. It can be shown that for any 
ft £ GF(2 m ), the minimal polynomial 4>p(X) is given by 

i - 1 

M x ) = l[{X + ft 2 ') (7.1-12) 

i= 0 

where i is the smallest integer such that ft 2 '~ ] = 1. The roots of <f>p(X ), i.e., elements 
of the form ft 2 ' , 1 < i < l — 1, are called conjugates of ft. It can be shown that all 
conjugates of an element of a finite field have the same order. This means that conjugates 
of primitive elements are also primitive. We add here that although all conjugates have 
the same order, this does not mean that all elements of the same order are necessarily 
conjugates. All elements of the finite field that are conjugates of each other are said 
to belong to the same conjugacy class. Therefore to find the minimal polynomial of 
ft £ GF(<y), we take the following steps: 

1. Find the conjugacy class of ft, i.e., all elements of the form ft 2 ' for 0 < / < i — 1 

of! 

where i is the smallest positive integer such that ft- = ft. 

2. Find <f>p(X) as a monic polynomial whose roots are in the conjugacy class of ft. This 
is done by using Equation 7.1-12. 

The <j)fi(X) obtained by this procedure is guaranteed to be a prime polynomial with 
coefficients in GF(2). 

example 7 . 1 - 5 . To find the minimal polynomial of ft = a 5 in GF(16), we observe 
that ft 4 = a 20 = a 5 = ft. Hence, 1 = 2, and the conjugacy class is {ft, ft 2 }. Therefore, 


1 

<Pp{X) = \{(X + ft 2i ) 

i= 0 

= (X + ft)(X + ft 2 ) 

= (X + a 5 )(X + a 10 ) 

= X 2 + (a 5 + q! 15 )X + a 15 
= X 2 + X + l 


(7.1-13) 


For y — a 3 we have 1 = 4 and the conjugacy class is {y, y 2 , y 4 , y 8 }. Therefore, 

3 

M X ) = I[ ( x + r 2 ‘) 

i=0 

= (X + y)(X + y 2 )(X + y 4 )(X + y 8 ) 

= (X + a 3 )(X + a 6 )(X + a 12 )(X + a 9 ) 

= X 4 + X 3 + X 2 + X + 1 


(7.1-14) 
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To find the minimal polynomial of a, we note that a 16 = a, hence 1 = 4 and the 
conjugacy class is {a, a 2 , a 4 , a 8 }. The resulting minimal polynomial is 

3 

<M*) = n (X + a 2i ) 

i=0 (7.1-15) 

= (X + a)(X + a 2 )(X + a 4 )(X + a 8 ) 

= X 4 + X + 1 

For S = a 7 we again have 1 = 4, and the conjugacy class is {5 , c5 2 , <5 4 , <5 8 }. The minimal 
polynomial is 

3 

UX) = \{{x + f') 

!=0 (7.1-16) 

= (X + a 7 )(X + a 14 )(X + a l3 )(X + a 11 ) 

= X 4 + x 3 + 1 

Note that a and 8 are both primitive elements, but they belong to two different conjugacy 
classes and thus have different minimal polynomials. 

We conclude our discussion of Galois field properties by observing that all the p m 
elements of GF( p m ) are the roots of the equation 

X pm -X = 0 (7.1-17) 

or equivalently, all nonzero elements of GF( /;"') are the roots of 

X pm-1 -1=0 (7.1-18) 

This means that the polynomial A' 2 ” -1 — 1 can be uniquely factored over GF(2) into the 
product of the minimal polynomials corresponding to the conjugacy classes of nonzero 
elements of GF(2 m ). In fact X 2 "'~ l — 1 can be factorized over GF(2) as the product 
of all prime polynomials over GF(2) whose degree divides m . For more details on the 
structure of finite fields and the proofs of the properties we covered here, the reader is 
referred to Mac Williams and Sloane (1977), Wicker (1995), and Blahut (2003). 

7.1-2 Vector Spaces 

A vector a space over a field of scalars {F, +, •} is an Abelian group {V, +, 0} whose 
elements are denoted by boldface symbols such as v and called vectors , with vector 
addition + and identity element 0; and an operation called scalar multiplication for 
each c e F and each v e V that is denoted by c ■ v such that the following properties 
are satisfied: 

1. c • v e V 

2. c ■ (t>i + v 2 ) = c ■ vt + c ■ v 2 

3. ci • (c 2 ■ v) = (d • c 2 ) ■ v 

4. (a + c 2 ) ■ v = ci ■ v + c 2 ■ v 

5. 1 ■ i; = v 
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It can be easily shown that the following properties are satisfied: 


1. 0 ■ v = 0 

2. c • 0 = 0 

3. (— c) ■ v = c ■ (—v) = — (c • v) 

We will be mainly dealing with vector spaces over the scalar field GF(2). In this 
case a vector space V is a collection of binary ;?-tuples such that if v\, V 2 e V, 
then i>i + V 2 e V , where + denotes componentwise binary addition, or component- 
wise EXCLUSIVE-OR operation. Note that since we can choose Vi = v | . we have 
OeV. 


■ 7.2 

GENERAL PROPERTIES OF LINEAR BLOCK CODES 

A q- ary block code C consists of a set of M vectors of length n denoted by c m = 
(c m i, c m 2 , . . . , I < in < M, and called codewords whose components are selected 

from an alphabet of q symbols, or elements. When the alphabet consists of two symbols, 
0 and 1 . the code is a binary code. It is interesting to note that when q is a power of 2, 
i.e., q = 2 b where b is a positive integer, each q- ary symbol has an equivalent binary 
representation consisting of b bits; thus, a nonbinary code of block length N can be 
mapped into a binary code of block length n = bN . 

There are 2" possible codewords in a binary block code of length n. From these 2" 
codewords, we may select M = 2 k codewords (k < n) to form a code. Thus, a block of k 
information bits is mapped into a codeword of length n selected from the set of M = 2 k 
codewords. We refer to the resulting block code as an (n. k) code, with rate R, = k/n. 
More generally, in a code having q symbols, there are q 11 possible codewords. A subset 
of M = q k codewords may be selected to transmit fc- symbol blocks of information. 

Besides the code rate parameter R c , an important parameter of a codeword is its 
weight, which is simply the number of nonzero elements that it contains. In general, 
each codeword has its own weight. The set of all weights in a code constitutes the 
weight distribution of the code. When all the M codewords have equal weight, the code 
is called a. fixed-weight code or a constant-weight code. 

A subset of block codes, called linear block codes, is particularly well studied 
during the last few decades. The reason for the popularity of linear block codes is that 
linearity guarantees easier implementation and analysis of these codes. In addition, it 
is remarkable that the performance of the class of linear block codes is similar to the 
performance of the general class of block codes. Therefore, we can limit our study to 
the subclass of linear block codes without sacrificing system performance. 

A linear block code C is a k-dimensional subspace of an n -dimensional space which 
is usually called an (n, k ) code. For binary codes, it follows from Problem 7.1 1 that a 
linear block code is a collection of 2 k binary sequences of length n such that for any 
two codewords C\,C 2 e C we have c\ + Ci e C. Obviously, 0 is a codeword of any 
linear block code. 
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7.2-1 Generator and Parity Check Matrices 

In a linear block code, the mapping from the set of M = 2 k information sequences of 
length k to the corresponding 2 k codewords of length n can be represented by a k x n 
matrix G called the generator matrix as 

c m = u m G , 1 < m < 2 k (7.2-1) 

where u m is a binary vector of length k denoting the information sequence and c m 
is the corresponding codeword. The rows of G are denoted by g t , 1 < i < k, 
denoting the codewords corresponding to the information sequences (1,0, . . . , 0), 
(0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1). 


and hence, 



IJriJ 


k 

Cm = ^ ' tl m iSi 

i=l 


(7.2-2) 


(7.2-3) 


where the summation is in GF(2), i.e., modulo-2 summation. 

From Equation 7.2-2 it is clear that the set of codewords of C is exactly the set of 
linear combinations of the rows of G, i.e., the row space of G. Two linear block codes 
C\ and Ci are called equivalent if the corresponding generator matrices have the same 
row space, possibly after a permutation of columns. 

If the generator matrix G has the following structure 


G = \I k | P ] 


(7.2-4) 


where I k isakxk identity matrix and P is a k x (n — k) matrix, the resulting linear block 
code is called systematic. In systematic codes the first k components of the codeword 
are equal to the information sequence, and the following n — k components, called the 
parity check bits, provide the redundancy for protection against errors. It can be shown 
that any linear block code has a systematic equivalent; i.e., its generator matrix can 
be put in the form given by Equation 7.2-4 by elementary row operations and column 
permutation. 

Since C is a k-dimensional subspace of the n -dimensional binary space, its orthog- 
onal complement, i.e., the set of all n-dimensional binary vectors that are orthogonal 
to the the codewords of C, is an (n — k)-dimensional subspace of the n-dimensional 
space, and therefore it defines an (n ,n—k) code which is denoted by C 1 and is called 
the dual code of C. The generator matrix of the dual code is an (n — k) x n matrix 
whose rows are orthogonal to the rows of G, the generator matrix of C. The generator 
matrix of the dual code is called the parity check matrix of the original code C and is 
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denoted by H. Since any codeword of C is orthogonal to all rows of H . we conclude 
that for all c e C 


cH' = 0 (7.2-5) 

Also if for some binary /(-dimensional vector c we have cH' = 0, then c belongs to 
the orthogonal complement of H, i.e., c e C. Therefore, a necessary and sufficient 
condition for c e {0, 1}" to be a codeword is that it satisfy Equation 7.2-5. Since rows 
of G are codewords, we conclude that 


GH = 0 (7.2-6) 

In the special case of systematic codes, where G = [Ik | /*]. the parity check matrix is 
given by 


H=[-P t | /„_*] (7.2-7) 

which obviously satisfies GH' = 0. For binary codes —P f = P‘ and H = [P r \ /„_*] . 
example 7.2-1. Consider a (7, 4) linear block code with 


G = [U\P\ = 


'1 0 0 0 1 

0 10 0 1 

0 0 10 1 

0 0 0 1 0 


0 1 " 
1 1 
1 0 
1 1 


(7.2-8) 


Obviously this is a systematic code. The parity check matric for this code is obtained 
from Equation 7.2-7 as 


H=[P‘ | /„_*] 


'1110 10 0 " 
0 1110 10 
110 10 0 1 


(7.2-9) 


If u = (m, U 2 , M 3 , ua) is an information sequence, the corresponding codeword 
c = (ci , Ci , . . . , C 7 ) is given by 


Ci = Hi 

C 2 = U 2 
C3 = M3 

c 4 = u 4 (7.2-10) 

C5 = u\ T U2 U3 
C6 = U2 + «3 + «4 

C7 = // 1 T H3 T u 4 

and from Equations 7.2-10 it can be easily verified that all codewords c satisfy Equa- 
tion 7.2-5. 
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7.2-2 Weight and Distance for Linear Block Codes 

The weight of a codeword c £ C is denoted by w(c) and is the number of nonzero 
components of that codeword. Since 0 is a codeword of all linear block codes, we 
conclude that each linear block code has one codeword of weight zero. The Hamming 
distance between two codewords c\,c 2 £ C, denoted by d(c i, C2), is the number of 
components at which c 1 and C2 differ. It is clear that the weight of a codeword is its 
distance from 0. 

The distance between C\ and C2 is the weight of Cj — C2, and since in linear block 
codes ci — C2 is a codeword, then d(c \ , C2) = w(c 1 — C2). We clearly see that in linear 
block codes there exists a one-to-one correspondence between weight and the distance 
between codewords. This means that the set of possible distances from any codeword 
c € C to all other codewords is equal to the set of weights of different codewords, 
and thus is independent of c. In other words, in a linear block code, looking from any 
codeword to all other codewords, one observes the same set of distance, regardless of 
the codeword one is looking from. Also note that in binary linear block codes we can 
substitute ci — C2 with ci + C2. 

The minimum distance of a code is the minimum of all possible distances between 
distinct codewords of the code, i.e., 

dmin = min d(c u c 2 ) (7.2-11) 

C\,C2EC 

C\^C 2 

The minimum weight of a code is the minimum of the weights of all nonzero codewords, 
which for linear block codes is equal to the minimum distance. 

ittmin = min w(c ) (7.2-12) 

ceC 

c^O 

There exists a close relation between the minimum weight of a linear block code and 
the columns of the parity check matrix H. We have previously seen that the necessary 
and sufficient condition for c e {0, 1 }" to be a codeword is that cH' = 0. If we choose 
c to be a codeword of minimum weight, from this relation we conclude that u) mm (or 
dmin) columns of H are linearly dependent. On the other hand, since there exists no 
codeword of weight less than d m i n , no fewer than d m ; n columns of H can be linearly 
dependent. Therefore, d mm represents the minimum number of columns of H that can 
be linearly dependent. In other words the column space of H has dimension d m — 1 . 

In certain modulation schemes there exists a close relation between Hamming 
distance and Euclidean distance of the codewords. In binary antipodal signaling — for 
instance, BPSK modulation — the 0 and 1 components of a codeword c e C are mapped 
to —sfE~ c and + s f£~ c , respectively. Therefore if s is the vector corresponding to the 
modulated sequence of codeword c, we have 

s,nj = (2 c mj - l)y/Sc, 1 < j < n, 1 < m < M (7.2-13) 

and therefore, 



( C ,n . C ) 


(7.2-14) 
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where d Sm , Sm , denotes the Euclidean distance between the modulated sequences and 
d(c m , c m >) is the Hamming distance between the corresponding codewords. From the 
above we have 


4 min = 4£4nin (7.2-15) 

where d\: mm is the minimum Euclidean distance of the BPSK modulated sequences 
corresponding to the codewords. Using Equation 7.1-8, we conclude that 

4 min = 4/? c £Wmin (7.2-16) 

For the binary orthogonal modulations, e.g., binary orthogonal FSK, we similarly 
have 

4 min = ZRcSbdmin (7.2-17) 


7.2-3 The Weight Distribution Polynomial 


An ( n,k ) code has 2 k codewords that can have weights between 0 and n. In any 
linear block code there exists one codeword of weight 0, and the weights of nonzero 
codewords can be between c/ mm and n. The weight distribution polynomial (WEP) or 
weight enumeration function (WEF) of a code is a polynomial that specifies the number 
of codewords of different weights in a code. The weight distribution polynomial or 
weight enumeration function is denoted by A(Z) and is defined by 


A(Z) = J2 A i Z ' = 1 + 5Z AiZ ‘ (7.2-18) 

*=0 ^rnin 

where Aj denotes the number of codewords of weight i. The following properties of 
the weight enumeration function for linear block codes are straightforward: 


n 

A{\) = Y J A i =2 k 

;=o 

A(0) = 1 


(7.2-19) 


The weight enumeration function for many block codes is unknown. For low rate 
codes the weight enumeration function can be obtained by using a computer search. 
The MacWilliams identity expresses the weight enumeration function of a code in 
terms of the weight enumeration function of its dual code. By this identity, the weight 
enumeration function of a code A(Z) is related to the weight enumeration function of 
its dual code A d (Z) by 

A(Z) = 2~ (n ~ k \\ + Z) n A d (7.2-20) 

The weight enumeration function of a code is closely related to the distance enu- 
merator function of a constellation as defined in Equation 4.2-74. Note that for a linear 
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block code, the set of distances seen from any codeword to other codewords is indepen- 
dent of the codeword from which these distances are seen. Therefore, in linear block 
codes the error bound is independent of the transmitted codeword, and thus, without 
loss of generality, we can always assume that the all-zero codeword 0 is transmitted. 
The value of d 2 in Equation 4.2-74 depends on the modulation scheme. For BPSK 
modulation from Equation 7.2-14 we have 

dl(s m ) = 4 £ b R c w(c m ) (7.2-21) 

where d\:(s m ) denotes the Euclidean distance between s m and the modulated sequence 
corresponding to 0. For orthogonal binary FSK modulation we have 

dl(s m ) = 2 £ b R c w(c m ) (7.2-22) 

The distance enumerator function for BPSK is given by 

n 

T(X) = A t X WAi = (A(Z) - l)| z ^ 48c£t (7.2-23) 

i — ^Anin 

and for orthogonal BFSK by 

n 

T{X) = £ Aj X 2Rc£bi = (A(Z) - l) Ux2Kc£b (7.2-24) 

i = dmin 

Another version of the weight enumeration function provides information about 
the weight of the codewords as well as the weight of the corresponding information 
sequences. This polynomial is called the input-output weight enumeration function 
(IOWEF), denoted by B(Y, Z) and is defined as 

n k 

B(Y,Z) = J2J2 B ‘I Yi Z ' (7.2-25) 

;=o j= o 

where B n is the number of codewords of weight i that are generated by information 
sequences of weight j. Clearly, 


k 

A, = Y, (7.2-26) 

i=o 

and for linear block codes we have B{ 0, 0) = Boo = 1 - It is also clear that 

A(Z) = B(Y, Z ) | (7.2-27) 

I Y=l 

A third form of the weight enumeration function, called the conditional weight 
enumeration function (CWEF), is defined by 

n 

B.i(Z) = Y B ij Z i 

i = 0 


(7.2-28) 
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and it represents the weight enumeration function of all codewords corresponding to 
information sequences of weight j. From Equations 7.2-28 and 7.2-25 it is easy to see 
that 

1 d j 

B:(Z) = rB(Y, Z) (7.2-29) 

; ./! dYJ V 0 

example 7.2-2. In the code discussed in Example 7.2-1, there are 2 4 = 16 codewords 
with possible weights between 0 and 7. Substituting all possible information sequences 
of the form u = (ii\. u 2 , W 3 , w 4 ) and generating the codewords, we can verify that for 
this code d, lml = 3 and there are 7 codewords of weight 3 and 7 codewords of weight 
4. There exist one codeword of weight 7 and one codeword of weight 0. Therefore, 

A(Z) = 1+7 Z 3 + 7 Z 4 + Z 7 (7.2-30) 

It is also easy to verify that for this code 

Boo — 1 B 31 = 3 B 02 = 3 B 33 = 1 

B 41 = 1 B 42 — 3 B 43 = 3 B 74 = 1 

Hence, 

B(Y, Z) = 1 + 3TZ 3 + 3T 2 Z 3 + T 3 Z 3 + FZ 4 + 3F 2 Z 4 + 3F 3 Z 4 + F 4 Z 7 (7.2-31) 
and 

B 0 (Z) = 1 
B X (Z) = 3Z 3 + Z 4 

B 2 (Z ) = 3Z 3 + 3Z 4 (7.2-32) 

fi 3 (Z) = Z 3 + 3Z 4 
B 4 (Z) = z 7 

7.2 — 4 Error Probability of Linear Block Codes 

Two types of error probability can be studied when linear block codes are employed. 
The block error probability or word error probability is defined as the probability of 
transmitting a codeword c„, and detecting a different codeword c m '. The second type 
of error probability is the bit error probability, defined as the probability of receiving 
a transmitted information bit in error. 

Block Error Probability 

Linearity of the code guarantees that the distances from c,„ to all other codewords are 
independent of the choice of c,„. Therefore, without loss of generality we can assume 
that the all-zero codeword 0 is transmitted. 

To determine the block (word) error probability P e , we note that an error occurs 
if the receiver declares any codeword c m 7 ^ 0 as the transmitted codeword. The prob- 
ability of this event is denoted by the pairwise error probability Po-> c ,„ , as defined in 
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Section 4.2-3. Therefore, 

P e <J2 P ^c m (7.2-33) 

c m eC 

c„,^ 0 

where in general P» + Cm depends on the Hamming distance between 0 and c m , which 
is equal to w(c m ), in a way that depends on the modulation scheme employed for 
transmission of the codewords. Since for codewords of equal weight we have the 
same / J o , c „, , we conclude that 


Pe<J2 A ‘ P i( 0 ( 7 . 2 - 34 ) 

i = 4nin 

where P 2 (i) denotes the pairwise error probability (PEP) between two codewords with 
Hamming distance i . 

From Equation 6.8-9 we know that 

n 

P{\ > c,„ < iie Vp(yi\0)p(yr\c, m ) (7.2-35) 

;=1 y.esr 

Following Example 6.8-1 we define 

A = X] Vp(y\0)p(y\D (7.2-36) 

With this definition, Equation 7.2-35 reduces to 

P^c m = p 2 (w(c m )) < A u ' (c ™ ) (7.2-37) 

Substituting this result into Equation 7.2-34 results in 

n 

Pe< A ‘ A ‘ (7.2-38) 


P e < A(A) - 1 


(7.2-39) 


where A{Z) is the weight enumerating function of the linear block code. 
From the inequality 

E (VpCyl 0 ) - Vp(y\v) >° 

we easily conclude that 

a = Vp(y\0)pW) < i 

■y&gr 

and hence, for i > t/ mm , 


A' < A dmin 


(7.2-40) 


(7.2-41) 


(7.2-42) 
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Using this result in Equation 7.2-38 yields the simpler, but looser, bound 

P e < (2 k - l)A rfmin (7.2^13) 


Bit Error Probability 

In general, errors at different locations of an information sequence of length k can occur 
with different probabilities. We define the average of these error probabilities as the bit 
error probability for a linear block code. We again assume that the all-zero sequence is 
transmitted; then the probability that a specific codeword of weight i will be decoded at 
the detector is equal to /^O)- The number of codewords of weight i that correspond to 
information sequences of weight j is denoted by Bjj. Therefore, when 0 is transmitted, 
the expected number of information bits received in error is given by 

k n 

BijP 2 (i) (7.2-44) 

0 ^min 

Since for 0 < i < d m [ n we have Bjj = 0, we can write this as 

k n 

^£;£ BijP 2 (i) (7.2-45) 

7=0 i=0 

The (average) bit error probability of the linear block code Pi, is defined as the ratio 
of the expected number of bits received in error to the total number of transmitted bits, 
i.e., 


P h = - 


k n 


^t£./£«'/W 

K 7=0 i=0 

k n 


1 




7=0 i=0 


(7.2-46) 


where in the last step we have used Equation 7 .2-37 . From Equation 7.2-28 we see 
that the last sum is simply Bj( A); therefore, 
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We can also express the bit error probability in terms of the IOWEF by using 
Equation 7.2-25 as 


i= 0 7=0 

1 d 

= B(Y, Z) | 

k dY 


(7.2-48) 


■ 7.3 

SOME SPECIFIC LINEAR BLOCK CODES 

In this section, we briefly describe some linear block codes that are frequently encoun- 
tered in practice and list their important parameters. Additional classes of linear codes 
are introduced in our study of cyclic codes in Section 7.9. 


7.3-1 Repetition Codes 

A binary repetition code is an (n , 1 ) code with two codewords of length n . One codeword 
is the all-zero codeword, and the other one is the all-one codeword. This code has a 
rate of R c = 1 and a minimum distance of d mm = n. The dual of a repetition code is 
an (n, n — 1) code consisting of all binary sequences of length n with even parity. The 
minimum distance of the dual code is clearly d m \ n = 2. 


7.3-2 Hamming Codes 

Hamming codes are one of the earliest codes studied in coding theory. Hamming codes 
are linear block codes with parameters n = 2'" — 1 and k = 2 m — m — 1, for m > 3. 
Hamming codes are best described in terms of their parity check matrix H which is an 
(n — k) x n = m x (2 m — 1) matrix. The 2 m — 1 columns of H consist of all possible 
binary vectors of length m excluding the all-zero vector. The rate of a Hamming code 
is given by 


Rc 


2 m - m- 1 
2 m - 1 


(7.3-1) 


which is close to 1 for large values of m. 

Since the columns of H include all nonzero sequences of length in . the sum of any 
two columns is another column. In other words, there always exist three columns that 
are linearly dependent. Therefore, for Hamming codes, independent of the value of m, 
tfmin = 3. 
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The weight distribution polynomial for the class of Hamming ( n , k ) codes is known 
and is expressed as (see Problem 7.23) 

A(Z) = [(1 + Z) n + n( 1 + Z) (,, ~ 1)/2 ( 1 - Z) (n+1)/2 1 (7.3-2) 

77+1 


example 7.3-1. To generate the H matrix for a (7, 4) Hamming code (corresponding 
to m = 3), we have to use all nonzero sequences of length 3 as columns of H. We can 
arrange these columns in such a way that the resulting code is systematic as 


H = 


"1 1 
0 1 
1 1 


1 0 
1 1 
0 1 


1 0 
0 1 
0 0 


O' 

0 

1 


(7.3-3) 


This is the parity check matrix derived in Example 7.2-1 and given by Equation 7.2-9. 


7.3-3 Maximum-Length Codes 

Maximum-length codes are duals of Hamming codes; therefore these are a family of 
(2 m — 1. in) codes for m >3. The generator matrix of a maximum-length code is the 
parity check matrix of a Hamming code, and therefore its columns are all sequences 
of length 77i with the exception of the all-zero sequence. In Problem 7.23 it is shown 
that maximum-length codes are constant-weight codes; i.e., all codewords, except the 
all-zero codeword, have the same weight, and this weight is equal to 2 m_1 . Therefore, 
the weight enumeration function for these codes is given by 

A(Z) = 1 + (2 m - l)Z m_1 (7.3-4) 

Using this weight distribution function and applying the Mac Williams identity given 
in Equation 7.2-20, we can derive the weight enumeration function of the Hamming 
code as given in Equation 7.3-2. 


7.3-4 Reed-Muller Codes 

Reed-Muller codes introduced by Reed (1954) and Muller (1954) are a class of linear 
block codes with flexible parameters that are particularly interesting due to the existence 
of simple decoding algorithms for them. 

A Reed-Muller code with block length n = 2 m and order r < m is an (n. k ) linear 
block code with 


77 = 2"' 



d — 2” 

u min — 


(7.3-5) 
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whose generator matrix is given by 


"Go" 


G = 


Gi 

G2 

G r . 


where Go is a 1 x 77 matrix of all Is 


Go = [1 1 1 


1] 


(7.3-6) 


(7.3-7) 


and Gi is an m x n matrix whose columns are distinct binary sequences of length m 
put in natural binary order. 


0 0 0 • • • 1 1 

0 0 0 ••• 11 

0 0 0 ••• 11 

0 0 1 ••• 11 

0 1 0 0 1 


(7.3-8) 


G2 is an ('") x n matrix whose rows are obtained by bitwise multiplication of two rows 
of G2 at a time. Similarly, G, for 2 < i < r is a ('") x n matrix whose rows are obtained 
by bitwise multiplication of r rows of G2 at a time. 

example 7.3-2. The first-order Reed-Muller code with block length 8 is an (8, 4) 
code with generator matrix 


"liiiiiir 

00001111 

00110011 

01010101 


(7.3-9) 


This code can be obtained from a (7, 3) maximum-length code by adding one extra 
parity bit to make the overall weight of each codeword even. This code has a minimum 
distance of 4. The second-order Reed-Muller code with block length 8 has the generator 
matrix 


G = 


1 

0 

0 

0 

0 

0 

0 


1 1 1 
0 0 0 
0 1 1 
1 0 1 
0 0 0 
0 0 0 
0 0 1 


1 1 1 
1 1 1 
0 0 1 
0 1 0 
0 0 1 
0 1 0 
0 0 0 


1 

1 

1 

1 

1 

1 

1 


(7.3-10) 


and has a minimum distance of 2. 
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7.3-5 Hadamard Codes 


Hadamard signals were introduced in Section 3.2-4 as examples of orthogonal signal- 
ing schemes. A Hadamard code is obtained by selecting as codewords the rows of a 
Hadamard matrix. A Hadamard matrix M„ is an n x n matrix (ji is an even integer) 
of Is and Os with the property that any row differs from any other row in exactly " 2 
positions.^ One row of the matrix contains all zeros. The other rows each contain | 
zeros and " ones. 

For n = 2, the Hadamard matrix is 


M 2 


'0 O' 
0 1 


(7.3-11) 


Furthermore, from M„, we can generate the Hadamard matrix M 2n according to the 
relation 


M 2 „ = 




Mn 


(7.3-12) 


where M„ denotes the complement (Os replaced by Is and vice versa) of M„. Thus, by 
substituting Equation 7.3-1 1 into Equation 7.3-12, we obtain 


M 4 


'0 0 0 O' 
0 10 1 
0 0 11 
0 110 


The complement of M 4 is 


M 4 


Till' 
10 10 
110 0 
10 0 1 


(7.3-13) 


(7.3-14) 


Now the rows of M 4 and M 4 form a linear binary code of block length n = 4 having 
2n = 8 codewords. The minimum distance of the code is d mm = | = 2. 

By repeated application of Equation 7.3-12, we can generate Hadamard codes 
with block length n = 2 m , k = log 2 2 n = log 2 2 m+1 = m + 1, and 4 nlin = " = 2'" 1 , 
where m is a positive integer. In addition to the important special cases where n = 2 m , 
Hadamard codes of other block lengths are possible, but the resulting codes are not 
linear. 


tin Section 3.2-4 the elements of the Hadamard matrix were denoted +1 and —1, resulting in mutually 
orthogonal rows. We also note that the M = 2 k signal waveforms, constructed from Hadamard codewords 
by mapping each bit in a codeword into a binary PSK signal, are orthogonal. 
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7.3-6 Golay Codes 

The Golay code (Golay (1949)) is a binary linear (23, 12) code with d m j n = 7. The 
extended Golay code is obtained by adding an overall parity bit to the (23, 12) Golay 
code such that each codeword has even parity. The resulting code is a binary linear 
(24, 12) code with d min = 8. The weight distribution polynomials of Golay code and 
extended Golay code are known and are given by 

A g (Z) = 1 + 253Z 7 + 506Z 8 + 1288Z 11 + 1288Z 12 + 506Z 15 + 253Z 16 + Z 23 

A eg (Z) = 1 + 759Z 8 + 2576 Z 12 + 759Z 16 + Z 24 

(7.3-15) 

We discuss the generation of the Golay code in Section 7.9-5. 


■ 7.4 

OPTIMUM SOFT DECISION DECODING OF LINEAR BLOCK CODES 

In this section, we derive the performance of linear binary block codes on an AWGN 
channel when optimum (unquantized) soft decision decoding is employed at the re- 
ceiver. The bits of a codeword may be transmitted by any one of the binary signaling 
methods described in Chapter 3. For our purposes, we consider binary (or quaternary) 
coherent PSK, which is the most efficient method, and binary orthogonal FSK with 
either coherent detection or noncoherent detection. 

From Chapter 4, we know that the optimum receiver, in the sense of minimizing 
the average probability of a codeword error, for the AWGN channel can be realized as a 
parallel bank of M = 2 k filters matched to the M possible transmitted waveforms. The 
outputs of the M matched filters at the end of each signaling interval, which encom- 
passes the transmission of n binary symbols in the codeword, are compared, and the 
codeword corresponding to the largest matched filter output is selected. Alternatively, 
M cross-correlators can be employed. In either case, the receiver implementation can 
be simplified. That is, an equivalent optimum receiver can be realized by use of a sin- 
gle filter (or cross-correlator) matched to the binary PSK waveform used to transmit 
each bit in the codeword, followed by a decoder that forms the M decision variables 
corresponding to the M codewords. 

To be specific, let rj, j = 1,2 represent the n sampled outputs of the 
matched filter for any particular codeword. Since the signaling is binary coherent PSK, 
the output fj may be expressed either as 

r j = \[£c + n j (7.4-1) 

when the / th bit of a codeword is a 1, or as 

rj = - \f£~c + nj (7.4-2) 

when the j th bit is a 0. The variables {n represent additive white Gaussian noise at the 
sampling instants. Each nj has zero mean and variance \Nq. From knowledge of the 


Chapter Seven: Linear Block Codes 


425 


M possible transmitted codewords and upon reception of { r , } . the optimum decoder 
forms the M correlation metrics 

n 

CM m = C(r, c m ) = ^(2 c mj - 1 )rj, m = l,2,...,M (7.4-3) 

j = i 

where c m j denotes the bit in the / th position of the mth codeword. Thus, if c mj = 1, the 
weighting factor 2c m j — 1 = 1; and if c mj = 0, the weighting factor 2 c m j — 1 = — 1 . In 
this manner, the weighting 2 c m j — 1 aligns the signal components in { r j } such that the 
correlation metric corresponding to the actual transmitted codeword will have a mean 
value risf^c, while the other M — 1 metrics will have smaller mean values. 

Although the computations involved in forming the correlation metrics for soft 
decision decoding according to Equation 7.4-3 are relatively simple, it may still be im- 
practical to compute Equation 7.4-3 for all the possible codewords when the number 
of codewords is large, e.g., M > 2 10 . In such a case it is still possible to implement 
soft decision decoding using algorithms which employ techniques for discarding im- 
probable codewords without computing their entire correlation metrics as given by 
Equation 7.4-3. Several different types of soft decision decoding algorithms have been 
described in the technical literature. The interested reader is referred to the papers 
by Fomey (1966b), Weldon (1971), Chase (1972), Wainberg and Wolf (1973), Wolf 
(1978), and Matis and Modestino (1982). 

Block and Bit Error Probability in Soft Decision Decoding 

We can use the general bounds on the block error probability derived in Equa- 
tions 7.2-39 and 7.2-43 to find bounds on the block error probability for soft deci- 
sion decoding. The value of A defined by Equation 7.2-36 has to be found under the 
specific modulation employed to transmit codeword components. In Example 6.8-1 it 
was shown that for BPSK modulation we have A = e~ £c ^ N °, and since £ c = R c £b, we 
obtain 


Pe < (A(Z) 


1 ) 


Rc£ h 
Z=e N 0 


(7.4-4) 


where A(Z) is the weight enumerating polynomial of the code. 

The simple bound of Equation 7.2^13 under soft decision decoding reduces to 


P < (2 k — l)e~ Rcdwin£b/No 


(7.4-5) 


In Problem 7.18 it is shown that for binary orthogonal signaling, for instance, 
orthogonal BFSK, we have A = e ~ £ '/ 2N °. Using this result, we obtain the simple 
bound 

Pe < ( 2 k - I) e - R ' d ™ £ >/ 2 N ° (7.4-6) 


for orthogonal BFSK modulation. 

Using the inequality 2 k — 1 < 2 k = e H " 2 , we obtain 


jj ^ YbyRcdm in 


tln2 
Zb ) 


for BPSK 


(7.4-7) 


426 


Digital Communications 


and 

P„ <e~^ ( R ^~ TT ) for orthogonal BFSK (7.4-8) 


where as usual yb denotes 8b/ No, the SNR per bit. 

When the upper bound in Equation 7.4-7 is compared with the performance of 
an uncoded binary PSK system, which is upper-bounded as ^ exp(— y/,), we find that 
coding yields a gain of approximately 10 log(/? ( .ri m ; n — k In 2/ y/j dB. We may call this 
the coding gain. We note that its value depends on the code parameters and also on the 
SNR per bit y*. For large values of y*, the limit of the coding gain, i.e., R c d min: is called 
the asymptotic coding gain. 

Similar to the block error probability, we can use Equation 7.2-48 to bound the bit 
error probability for BFSK and orthogonal BFSK modulation. We obtain 


1 9 

P h < B(Y, Z ) 

k dY 


1 9 

P b < B(Y, Z) 

k dY 


r=l,Z=exp(-Mfc ) 

>-=l,Z=exp(-%gt) 


for BPSK 


for orthogonal BFSK 


(7.4-9) 


Soft Decision Decoding with Noncoherent Detection 

In noncoherent detection of binary orthogonal FSK signaling, the performance is 
further degraded by the noncoherent combining loss. Here the input variables to the 
decoder are 

{roj = \VZ+Noj\ 2 
\roj = \N 1J \ 2 

for j = 1, 2, . . . , n, where {No/} and { N , ; ) represent complex-valued mutually statis- 
tically independent Gaussian random variables with zero mean and variance 2Nq. The 
correlation metric CM\ is given as 


CM 1 = ^r 0 ; (7.4-11) 

j= i 

while the correlation metric corresponding to the codeword having weight w m is sta- 
tistically equivalent to the correlation metric of a codeword in which c mj = 1 for 
1 < j < w m and c m j = 0 for w m + 1 < j < n. Hence, CM m may be expressed as 

W m PI 

CM m = Y J >-\j+ Y, r °i (7.4-12) 

j = 1 j=w m +l 

The difference between CM\ and CM m is 

W m 

CM\ - CM m = Y( r °i ~ r D') 
j = i 


(7.4-13) 
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and the pairwise error probability (PEP) is simply the probability that CM\ — C M m < 0. 
But this difference is a special case of the general quadratic form in complex- valued 
Gaussian random variables considered in Chapter 1 1 and in Appendix B . The expression 
for the probability of error in deciding between CM\ and CM m is (see Section 11.1-1) 


P 2 (m) = 


1 


22 W m — 1 

where, by definition, 


eX P [-^YbRcWm ) ^2 Ki E YbRcWm 


W m — 1 


1=0 




i Him 1 i i ^ 

1 v — > / 2 W m ~ 1 


r= 0 


(7.4-14) 


(7.4-15) 


The union bound obtained by summing P 2 {m) over 2 < m < M provides us with an 
upper bound on the probability of a codeword error. 

As an alternative, we may use the minimum distance instead of the weight distri- 
bution to obtain the looser upper bound 


M - 1 

P e < — 

1 


exp 



4nin 1 

E K i 


i=0 



(7.4-16) 


A measure of the noncoherent combining loss inherent in the square-law detection 
and combining of the n elementary binary FSK waveforms in a codeword can be 
obtained from Figure 11.1-1, where d m \ n is used in place of L. The loss obtained is 
relative to the case in which the n elementary binary FSK waveforms are first detected 
coherently and combined, and then the sums are square-law-detected or envelope- 
detected to yield the M decision variables. The binary error probability for the latter 
case is 


and hence 


1 

Pi{m) = - exp 



YbR c w 


M 

Pe < E P - m) 

m= 1 


(7.4-17) 


(7.4-18) 


If d m j n is used instead of the weight distribution, the union bound for the codeword 
error probability in the latter case is 


Pe 


< -(M 
~ 2 



(7.4-19) 


similar to Equation 7.4-8. 

We have previously seen in Equation 7.1-10 that the channel bandwidth required 
to transmit the coded waveforms, when binary PSK is used to transmit each bit, is 
given by 


R 

R 


(7.4-20) 
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From Equation 4.6-7, the bandwidth requirement for an uncoded BPSK scheme is R. 
Therefore, the bandwidth expansion factor B e for the coded waveforms is 

B e =— (7.4-21) 

Kc 

Comparison with Orthogonal Signaling 

We are now in a position to compare the performance characteristics and bandwidth 
requirements of coded signaling with orthogonal signaling. As we have seen in Chap- 
ter 4, orthogonal signals are more power-efficient compared to BPSK signaling, but 
using them requires large bandwidth. We have also seen that using coded BPSK signals 
results in a moderate expansion in bandwidth and, at the same time, by providing the 
coding gain, improves the power efficiency of the system. 

Let us consider two systems, one employing orthogonal signaling and one employ- 
ing coded BPSK signals to achieve the same performance. We use the bounds given 
in Equations 4.4-17 and 7.4—7 to compare the error probabilities of orthogonal and 
coded BPSK signals, respectively. To have equal bounds on the error probability, we 
must have k = 2R c d m i n . Under this condition, the dimensionality of the orthogonal 
signals, given by N = M = 2 k , is given by IV = 2 R ‘ d ™'" . The dimensionality of the 
BPSK code waveform is n = k/R c = 2d mm . Since dimensionality is proportional to 
the bandwidth, we conclude that 

W i 2 2Rcd ™ 

orthogonal = (7.4-22) 

VTcoded BPSK 2d, urn 

For example, suppose we use a (63, 30) binary code that has a minimum distance 
d mm = 13. The bandwidth ratio for orthogonal signaling relative to this code, given by 
Equation 7.4-22, is roughly 205. In other words, an orthogonal signaling scheme that 
performs similar to the (63, 30) code requires 205 times the bandwidth of the coded 
system. This example clearly shows the bandwidth efficiency of coded systems. 


■ 7.5 

HARD DECISION DECODING OF LINEAR BLOCK CODES 

The bounds given in Section 7.4 on the performance of coded signaling waveforms 
on the AWGN channel are based on the premise that the samples from the matched 
filter or cross-correlator are not quantized. Although this processing yields the best 
performance, the basic limitation is the computational burden of forming M correlation 
metrics and comparing these to obtain the largest. The amount of computation becomes 
excessive when the number M of codewords is large. 

To reduce the computational burden, the analog samples can be quantized and 
the decoding operations are then performed digitally. In this section, we consider the 
extreme situation in which each sample corresponding to a single bit of a codeword is 
quantized to two levels: 0 and 1. That is, a hard decision is made as to whether each 
transmitted bit in a codeword is a 0 or a 1 . The resulting discrete-time channel (consisting 
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of the modulator, the AWGN channel, and the modulator/demodulator) constitutes a 
BSC with crossover probability p. If coherent PSK is employed in transmitting and 
receiving the bits in each codeword, then 


On the other hand, if FSK is used to transmit the bits in each codeword, then 


for noncoherent detection. 

Minimum-Distance (Maximum-Likelihood) Decoding 

The n bits from the detector corresponding to a received codeword are passed to the 
decoder, which compares the received codeword with the M possible transmitted code- 
words and decides in favor of the codeword that is closest in Hamming distance (num- 
ber of bit positions in which two codewords differ) to the received codeword. This 
minimum-distance decoding rule is optimum in the sense that it results in a minimum 
probability of a codeword error for the binary symmetric channel. 

A conceptually simple, albeit computationally inefficient, method for decoding is 
to first add (modulo-2) the received codeword vector to all the M possible transmitted 
codewords c m to obtain the error vectors e m . Hence, e m represents the error event 
that must have occurred on the channel in order to transform the codeword c m to the 
particular received codeword. The number of errors in transforming c m into the received 
codeword is just equal to the number of Is in e m . Thus, if we simply compute the weight 
of each of the M error vectors {e m } and decide in favor of the codeword that results in the 
smallest weight error vector, we have, in effect, a realization of the minimum-distance 
decoding rule. 

Syndrome and Standard Array 

A more efficient method for hard decision decoding makes use of the parity check 
matrix H. To elaborate, suppose that c m is the transmitted codeword and y is the 
received sequence at the output of the detector. In general, y may be expressed as 



(7.5-1) 



(7.5-2) 


for coherent detection and 



(7.5-3) 


y = Cm + e 


where e denotes an arbitrary binary error vector. The product y H' yields 


s = yH r 
= c m H t + eH' 
= eH ' 


(7.5-4) 


430 


Digital Communications 


where the (n — /t)-dimensional vector s is called the syndrome of the error pattern. In 
other words, the vector s has components that are zero for all parity check equations 
that are satisfied and nonzero for all parity check equations that are not satisfied. Thus, 
s contains the pattern of failures in the parity checks. 

We emphasize that the syndrome s is a characteristic of the error pattern and not of 
the transmitted codeword. If a syndrome is equal to zero, then the error pattern is equal 
to one of the codewords. In this case we have an undetected error. Therefore, an error 
pattern remains undetected if it is equal to one of the nonzero codewords. Hence, from 
the 2" — 1 error patterns (the all-zero sequence does not count as an error), 2 k — 1 are 
not detectable; the remaining 2” — 2 k nonzero error patterns can be detected, but not all 
can be corrected because there are only 2" k syndromes and, consequently, different 
error patterns result in the same syndrome. For ML decoding we are looking for the 
error pattern of least weight among all possible error patterns. 

Suppose we construct a decoding table in which we list all the 2 k possible code- 
words in the first row, beginning with the all-zero codeword c \ = 0 in the first (leftmost) 
column. This all-zero codeword also represents the all-zero error pattern. After com- 
pleting the first row, we put a sequence of length n which has not been included in 
the first row (i.e., is not a codeword) and among all such sequences has the minimum 
weight in the first column of the second row, and we call it e 2 . We complete the second 
row of the table by adding e 2 to all codewords and putting the result in the column 
corresponding to that codeword. After the second row is complete, we look among all 
sequences of length n that have not been included in the first two rows and choose 
a sequence of minimum weight, call it e 3 , and put it in the first column of the third 
row; and complete the third row similar to the way we completed the second row. This 
process is continued until all sequences of length n are used in the table. We obtain an 


k) table as follows: 




ci = 0 

Cl 

c 3 

C 2 k 

e 2 

Cl + e 2 

c 3 + e 2 • ■ 

c 2 k + e 2 

e 3 

c 2 + e 3 

c 3 + e 3 • ■ 

■ ■ c 2 k + e 3 

&2 n ~ k 

C 2 + C 2 n-k 

c 3 T e 2 n k • ■ 

■ ■ C 2 k + e 2 n-l 


This table is called a standard array. Each row, including the first, consists of k possible 
received sequences that would result from the corresponding error pattern in the first 
column. Each row is called a coset, and the first (leftmost) codeword (or error pattern) is 
called a coset leader. Therefore, a coset consists of all the possible received sequences 
resulting from a particular error pattern (coset leader). Also note that by construction 
the coset leader has the lowest weight among all coset members. 

example 7 . 5 - 1 . Let us construct the standard array for the (5, 2) systematic code with 
generator matrix given by 


T 0 1 0 F 
0 10 11 
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■ TABLE 7.5-1 

The Standard Array for Example 7.5-1 


00000 

01011 

10101 

11110 

00001 

01010 

10100 

mil 

00010 

01001 

10111 

11100 

00100 

01111 

10001 

11010 

01000 

00011 

11101 

10110 

10000 

11011 

00101 

onto 

11000 

10011 

01101 

00110 

10010 

11001 

00111 

01100 


This code has a minimum distance d m i n — 3. The standard array is given in Table 7.5-1 . 
Note that in this code, the coset leaders consist of the all-zero error pattern, five error 
patterns of weight 1, and two error patterns of weight 2. Although many more double 
error patterns exist, there is room for only two to complete the table. 

Now, suppose that e, is a coset leader and that c m was the transmitted codeword. 
Then the error pattern e, would result in the received sequence 

y = c m + a 


The syndrome is 


s = yH’ = (c m + ei)H r = c „,H t + eH l = e.H' 

Clearly, all received sequences in the same coset have the same syndrome, since the latter 
depends only on the error pattern. Furthermore, each coset has a different syndrome. 
This means that there exists a one-to-one correspondence between cosets (or coset 
leaders) and syndromes. 

The process of decoding the received sequence y basically involves finding the error 
sequence of the lowest weight c, such that s = yH‘ = ejH 1 . Since each syndrome 
s corresponds to a single coset, the error sequence e, is simply the lowest member of 
the coset, i.e., the coset leader. Therefore, after the syndrome is found, it is sufficient 
to find the coset leader corresponding to the syndrome and add the coset leader to y to 
obtain the most likely transmitted codeword. 

The above discussion makes it clear that coset leaders are the only error patterns 
that are correctable. To sum up the above discussion, from all possible 2" — 1 nonzero 
error patterns, 2 k — 1 corresponding to nonzero codewords are not detectable, and 
2" — 2 k are detectable of which only 2"~ k — 1 are correctable. 

example 7.5-2. Consider the (5, 2) code with the standard array given in Table 7.5-1 . 

The syndromes versus the most likely error patterns are given in Table 7.5-2. 

Now suppose the actual error vector on the channel is 

e = (] 0 1 0 0) 

The syndrome computed for the error is s = (0 0 1). Hence, the error determined 

from the table is e = (0 0 0 0 1 ). When e is added to y, the result is a decoding 
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TABLE 7.5-2 
Syndromes and Coset 
Leaders for Example 7.5-2 


Syndrome 

Error Pattern 

000 

00000 

001 

00001 

010 

00010 

100 

00100 

Oil 

01000 

101 

10000 

110 

11000 

111 

10010 


error. In other words, the (5, 2) code corrects all single errors and only two double 
errors, namely, (1 1 0 0 0)and(l 0 0 10). 


7.5-1 Error Detection and Error Correction Capability of Block Codes 

It is clear from the discussion above that when the syndrome consists of all zeros, the 
received codeword is one of the 2 k possible transmitted codewords. Since the minimum 
separation between a pair of codewords is r/ min , it is possible for an error pattern of weight 
d m i n to transform one of these 2 k codewords in the code to another codeword. When this 
happens, we have an undetected error. On the other hand, if the actual number of errors 
is less than d m j n , the syndrome will have a nonzero weight. When this occurs, we have 
detected the presence of one or more errors on the channel. Clearly, the ( n , k) block code 
is capable of detecting up to t/ min — 1 errors. Error detection may be used in conjunction 
with an automatic repeat-request (ARQ) scheme for retransmission of the codeword. 

The error correction capability of a code also depends on the minimum distance. 
However, the number of correctable error patterns is limited by the number of possible 
syndromes or coset leaders in the standard array. To determine the error correction 
capability of an ( n , k) code, it is convenient to view the 2 k codewords as points in an 
/(-dimensional space. If each codeword is viewed as the center of a sphere of radius 
(Hamming distance) t, the largest value that t may have without intersection (or tan- 
gency) of any pair of the 2 k spheres is t = \(d mm — 1)J , where [jcJ denotes the largest 
integer contained in x. Within each sphere lie all the possible received codewords of 
distance less than or equal to t from the valid codeword. Consequently, any received 
code vector that falls within a sphere is decoded into the valid codeword at the center of 
the sphere. This implies that an (n, k) code with minimum distance d mm is capable of 
correcting t = \J^(d mm — 1)J errors. Figure 7.5-1 is a two-dimensional representation 
of the codewords and the spheres. 

As described above, a code may be used to detect d mm — 1 errors or to correct 
t = ^(d mm — 1)J errors. Clearly, to correct t errors implies that we have detected t 
errors. However, it is also possible to detect more than t errors if we compromise in the 
error correction capability of the code. For example, a code with d m ; n = 7 can correct 
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up to f = 3 errors. If we wish to detect four errors, we can do so by reducing the radius 
of the sphere around each codeword from 3 to 2. Thus, patterns with four errors are 
detectable, but only patterns of two errors are correctable. In other words, when only 
two errors occur, these are corrected; and when three or four errors occur, the receiver 
may ask for a retransmission. If more than four errors occur, they will go undetected if 
the codeword falls within a sphere of radius 2. Similarly, for d m ; n = 7, five errors can 
be detected and one error corrected. In general, a code with minimum distance r/ min can 
detect e,[ errors and correct e c errors, where 

T e c < d mm 1 


and 


e c < e d 


7.5-2 Block and Bit Error Probability for Hard Decision Decoding 

In this section we derive bounds on the probability of error for hard decision decoding 
of linear binary block codes based on error correction only. 

From the above discussion, it is clear that the optimum decoder for a binary sym- 
metric channel will decode correctly if (but not necessarily only if) the number of errors 
in a codeword is less than one-half the minimum distance d m i„ of the code. That is, any 
number of errors up to 

t = 


1 

^(^min 1) 
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is always correctable. Since the binary symmetric channel is memoryless, the bit errors 
occur independently. Hence, the probability of m errors in a block of n bits is 

P(m , n) = Q p m ( 1 - pT~ m (7.5-5) 

and, therefore, the probability of a codeword error is upper-bounded by the expression 

n 

P e <Y. F(m - n) (7.5-6) 

m=t + 1 

For high signal-to-noise ratios, i.e., small values of /;, Equation 7.5-6 can be approxi- 
mated by its first term, and we have 


Pe » ^ j J P t+ \ 1 - (7.5-7) 

This equation states that when 0 is transmitted, the probability of error almost entirely is 
equal to the probability of receiving sequences of weight / + 1 . To derive an approximate 
bound on the error probability of each binary symbol in a codeword, we note that if 0 
is sent and a sequence of weight t + 1 is received, the decoder will decode the received 
sequence of weight t + 1 to a codeword at a distance at most / from the received 
sequence and hence a distance of at most 2 1 + 1 from 0. But since the minimum weight 
of the code is 2 1 + 1 , the decoded codeword has to be of weight 2t + 1 . This means 
that for each highly probable block error we have 2 1 + 1 bit errors in the codeword 
components; hence from Equation 7.5-7 we obtain 

Pbs ~ 2 ^- ( n .) p ,+ \ 1 - pf -'- 1 (7.5-8) 

n \t + 


Equality holds in Equation 7.5-6 if the linear block code is a perfect code. To 
describe the basic characteristics of a perfect code, suppose we place a sphere of radius 
t around each of the possible transmitted codewords. Each sphere around a codeword 
contains the set of all codewords of Hamming distance less than or equal to t from the 
codeword. Now, the number of codewords in a sphere of radius t = | |(<:/ mm — 1)J is 


1 + 



+ 



+ ■■■ + 




(7.5-9) 


Since there are M = 2 k possible transmitted codewords, there are 2 k nonoverlapping 
spheres, each having a radius t. The total number of codewords enclosed in the 2 k 
spheres cannot exceed the 2" possible received codewords. Thus, a r -error correcting 
code must satisfy the inequality 


t 


2‘E 



< 2 " 


l 


( 7 . 5 - 10 ) 
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or, equivalently, 



(7.5-11) 


A perfect code has the property that all spheres of Hamming distance 
t = f(7/ mm — 1)J around the M = 2 k possible transmitted codewords are disjoint 
and every received codeword falls in one of the spheres. Thus, every received code- 
word is at most at a distance t from one of the possible transmitted codewords, and 
Equation 7.5-1 1 holds with equality. For such a code, all error patterns of weight less 
than or equal to t are corrected by the optimum (minimum-distance) decoder. On the 
other hand, any error pattern of weight t + 1 or greater cannot be corrected. Conse- 
quently, the expression for the error probability given in Equation 7.5-6 holds with 
equality. The reader can easily verify that the Hamming codes, which have the param- 
eters n = 2" k — 1, d m i n = 3, and t = 1, are an example of perfect codes. The (23, 12) 
Golay code has parameters d min = 7 and r = 3. It can be easily verified that this code 
is also a perfect code. These two nontrivial codes and the trivial code consisting of two 
codewords of odd length n and d min = n are the only perfect binary block codes. 

A quasi-perfect code is characterized by the property that all spheres of Hamming 
radius t around the M possible transmitted codewords are disjoint and every received 
codeword is at most at a distance t + 1 from one of the possible transmitted codewords. 
For such a code, all error patterns of weight less than or equal to t and some error 
patterns of weight t + 1 are correctable, but any error pattern of weight t + 2 or greater 
leads to incorrect decoding of the codeword. Clearly, Equation 7.5-6 is an upper bound 
on the error probability, and 


is a lower bound. 

A more precise measure of the performance for quasi-perfect codes can be ob- 
tained by making use of the inequality in Equation 7.5-1 1 . That is, the total number of 
codewords outside the 2 k spheres of radius t is 


n 



(7.5-12) 


m=t + 2 



If these codewords are equally subdivided into 2 k sets and each set is associated with 
one of the 2 k spheres, then each sphere is enlarged by the addition of 



(7.5-13) 


codewords having distance t + 1 from the transmitted codeword. Consequently, of 
the ( ") error patterns of distance t + 1 from each codeword, we can correct f t+ \ 
error patterns. Thus, the error probability for decoding the quasi- perfect code may be 
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expressed as 

n 

Pe = ^2 P ^ 1 ' ”) + 

m=t+2 

Another pair of upper and lower bounds is obtained by considering two codewords 
that differ by the minimum distance. First, we note that P e cannot be less than the 
probability of erroneously decoding the transmitted codeword as its nearest neighbor, 
which is at a distance d mm from the transmitted codeword. That is, 

P e > J2 I mm ] p m (\ - p) d ™- m (7.5-15) 

m=Ld mi n/2J+l V m ) 

On the other hand, P e cannot be greater than 2 k — 1 times the probability of erroneously 
decoding the transmitted codeword as its nearest neighbor, which is at a distance dmm 
from the transmitted codeword. That is a union bound, which is expressed as 

</min l d \ 

P e < (2 k - 1) J2 ^ P m ( l ~ P) dam ~' n (7.5-16) 

m=LW2J+l V m J 

When M = 2 k is large, the lower bound in Equation 7.5-15 and the upper bound in 
Equation 7.5-16 are very loose. 

General bounds on block and bit error probabilities under hard decision decoding 
are obtained by using relations derived in Equations 7.2-39, 7.2-43, and 7.2-48. The 
value of A for hard decision decoding was found in Example 6.8-1 and is given by 
A = s/^pil — p ). The results are 


1 1 + 1 


- Pt + 1 


p' +l ( 1 - P) 


n—t— 1 


(7.5-14) 


Z-Vvu-P) 


Pe < (A(Z) - 1) 

P e <{2 k -\) [ 4 / 7(1 -p)YT 

r=\,z=Jtp(i- P ) 


1 3 

Pb < B(Y, Z) 

k dY 


(7.5-17) 

(7.5-18) 

(7.5-19) 


■ 7.6 

COMPARISON OF PERFORMANCE BETWEEN HARD DECISION 
AND SOFT DECISION DECODING 

It is both interesting and instructive to compare the bounds on the error rate performance 
of linear block codes for soft decision decoding and hard decision decoding on an 
AWGN channel. For illustrative purposes, we use the Golay (23, 12) code, which has 
the relatively simple weight distribution given in Equation 7.3-15. As stated previously, 
this code has a minimum distance d mm = 7. 

First we compute and compare the bounds on the error probability for hard decision 
decoding. Since the Golay (23, 12) code is a perfect code, the exact error probability 
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for hard decision decoding is given by Equation 7.5-6 as 



23 — m 


p) 


23— m 


(7.6-1) 


where p is the probability of a binary digit error for the binary symmetric channel. 
Binary (or four-phase) coherent PSK is assumed to be the modulation/demodulation 
technique for the transmission and reception of the binary digits contained in each 
codeword. Thus, the appropriate expression for p is given by Equation 7.5-1 . In addition 
to the exact error probability given by Equation 7.6-1, we have the lower bound given 
by Equation 7.5-15 and the three upper bounds given by Equations 7.5-16, 7.5-17, 
and 7.5-18. Numerical results obtained from these bounds are compared with the 
exact error probability in Figure 7.6-1. We observe that the lower bound is very loose. 
At P e = 10 5 , the lower bound is off by approximately 2 dB from the exact error 
probability. All three upper bounds are very loose for error rates above P e = 10 2 . 

It is also interesting to compare the performance between soft and hard decision 
decoding. For this comparison, we use the upper bounds on the error probability for 
soft decision decoding given by Equation 7.4-7 and the exact error probability for hard 
decision decoding given by Equation 7.6-1. Figure 7.6-2 illustrates these performance 
characteristics. We observe that the two bounds for soft decision decoding differ by 
approximately 0.5 dB at P e = 1 0 6 and by approximately 1 dB at P e = 10 2 . We also 



SNR per bit,y 4 (dB) 


FIGURE 7.6-1 

Comparison of bounds with exact error 
probability for hard decision decoding of Golay 
(23, 12) code. 
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SNR per bit, y h (dB) 


FIGURE 7.6-2 

Comparison of soft-decision decoding versus 
hard-decision decoding for a (23, 12) Golay 
code. 


observe that the difference in performance between hard and soft decision decoding 
is approximately 2 dB in the range 1 0 2 < P e < 10 6 . In the range P e > 10 2 , the 
curve of the error probability for hard decision decoding crosses the curves for the 
bounds. This behavior indicates that the bounds for soft decision decoding are loose 
when P e > 10~ 2 . 

As we observed in Example 6.8-3 and Figure 6.8-4, there exists a roughly 2-dB 
gap between the cutoff rates of a BPSK modulated scheme under soft and hard decision 
decoding. A similar gap also exits between the capacities in these two cases. This result 
can be shown directly by noting that the capacity of a BSC, corresponding to hard 
decision decoding, is given by Equation 6.5-29 as 


c = 1 - H 2 (p) = 1 + p logo p + (1 - p) log 2 (l - p) 

where 

P = Q (yj2y b R^j 
For small values of R c we can use the approximation 

f>0 

to obtain 

1 lybRc 

2 "V 


(7.6-2) 

(7.6-3) 


(7.6-4) 


P 




It 


(7.6-5) 
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Substituting this result into Equation 7.6-2 and using the approximation 


log 2 (l + x) ■ 


X ~ l 2 x2 

In 2 


(7.6-6) 


we obtain 


C = 


Yb^c 


(7.6-7) 


7T In 2 

Now we set C = R c . Thus, in the limit as R r approaches zero, we obtain the result 


y b = l -n In 2 ~ 0.37 dB (7.6-8) 

The capacity of the binary-input AWGN channel with soft decision decoding can 
be computed in a similar manner. The expression for the capacity in bits per code 
symbol, derived in Equations 6.5-30 to 6.5-32 can be approximated for low values of 
R c as 


C « ^ (7.6-9) 

m2 

Again, we set C = R c . Thus, as R, — > 0, the minimum SNR per bit to achieve capacity 
is 


Yb = In 2 1.6 dB (7.6-10) 

Equations 7.6-8 and 7.6-10 clearly show that at low SNR values there exists roughly a 
2-dB difference between the performance of hard and soft decision decoding. As seen 
from Figure 6.8-4, increasing SNR results in a decrease in the performance difference 
between hard and soft decision decoding. For example, at R c = 0.8, the difference 
reduces to about 1.5 dB. 

The curves in Figure 6.8-4 provide more information than just the difference in 
performance between soft and hard decision decoding. These curves also specify the 
minimum SNR per bit that is required for a given code rate. For example, a code rate of 
R c = 0.8 can provide arbitrarily small error probability at an SNR per bit of 2 dB, when 
soft decision decoding is used. By comparison, an uncoded binary PSK requires 9.6 dB 
to achieve an error probability of 1 0 “ 5 . Hence, a 7.6-dB gain is possible by employing 
a rate R c = | code. This gain is obtained by expanding the bandwidth by 25% since 
the bandwidth expansion factor of such a code is 1 / R c = 1.25. To achieve such a 
large coding gain usually implies the use of an extremely long block length code, 
and generally a complex decoder. Nevertheless, the curves in Figure 6.8-4 provide 
a benchmark for comparing the coding gains achieved by practically implementable 
codes with the ultimate limits for either soft or hard decision decoding. 
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■ 7.7 

BOUNDS ON MINIMUM DISTANCE OF LINEAR BLOCK CODES 

The expressions for the probability of error derived in this chapter for soft decision and 
hard decision decoding of linear binary block codes clearly indicate the importance 
of the minimum-distance parameter in the performance of the code. If we consider 
soft decision decoding, for example, the upper bound on the error probability given by 
Equation 7.4—7 indicates that, for a given code rate R c = k / n . the probability of error 
in an AWGN channel decreases exponentially with d m - m . When this bound is used in 
conjunction with the lower bound on d m j n given below, we obtain an upper bound on 
P e , the probability of a codeword error. Similarly, we may use the upper bound given by 
Equation 7.5-6 for the probability of error for hard decision decoding in conjunction 
with the lower bound on d mm to obtain an upper bound on the error probability for 
linear binary block codes on the binary symmetric channel. 

On the other hand, an upper bound on d mm can be used to determine a lower bound 
on the probability of error achieved by the best code. For example, suppose that hard 
decision decoding is employed. In this case, we can use Equation 7.5-15 in conjunction 
with an upper bound on <7 mm , to obtain a lower bound on P e for the best (n, k) code. 
Thus, upper and lower bounds on d mm are important in assessing the capabilities of 
codes. In this section we study some bounds on minimum distance of linear block 
codes. 


7.7-1 Singleton Bound 

The Singleton bound is obtained using the properties of the parity check matrix H . 
Recall from the discussion in Section 7.2-2 that the minimum distance of a linear 
block code is equal to the minimum number of columns of H, the parity check matrix, 
that are linearly dependent. From this we conclude that the rank of the parity check 
matrix is equal to d mm — 1 . Since the parity check matrix is an (n — k) x n matrix, its 
rank is at most n — k. Hence, 

dmm - 1 <n-k (7.7-1) 

or 

dmm < n — k + 1 (7.7-2) 

The bound given in Equation 7.7-2 is called the Singleton bound. Since d m in — 1 is 
approximately twice the number of errors that a code can correct, from Equation 7.7-1 
we conclude that the number of parity checks in a code must be at least equal to twice 
the number of errors a code can correct. Although the proof of the Singleton bound 
presented here was based on the linearity of the code, this bound applies to all block 
codes, linear and nonlinear, binary and nonbinary. 

Codes for which the Singleton bound is satisfied with equality, i.e., codes for which 
d mm = n — k + 1, are called maximum-distance separable, or MDS, codes. Repetition 
codes and their duals are examples of MDS codes. In fact these codes are the only 
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binary MDS codes. t In the class of nonbinary codes, Reed-Solomon codes studied in 
Section 7.11 are the most important examples of MDS codes. 

Dividing both sides of the Singleton bound by n, we have 


— < 1 - R c + - (7.7-3) 

n n 

If we define 

8 n = — (7.7-4) 

n 

we have 

S„ < 1 - R c + - (7.7-5) 

n 

Note that d mm /2 is roughly the number of errors that a code can correct. Therefore, 


] -Sn ~ - (7.7-6) 

2 n 

i.e., -f approximately represents the fraction of correctable errors in transmission of n 
bits. 

If we define 8 = lim„ >00 8 n , we conclude that as n — >• oo. 


8 < 1 - R c (7.7-7) 

This is the asymptotic form of the Singleton bound. 


7.7-2 Hamming Bound 


The Hamming or sphere packing bound was previously developed in our study of the 
performance of hard decision decoding and is given by Equation 7.5-1 1 as 


Taking the logarithm and dividing by n result in 

1 - R c > ~ logi 
n 




(7.7-8) 


(7.7-9) 


(7.7-10) 


This relation gives an upper bound for d m ; n in terms of n and k, known as the Hamming 
bound. Note that the proof of the Hamming bound is independent of the linearity of 


tThe (n, n) code with rfnun = 1 is another MDS code, but this code introduces no redundancy and can 
hardly be called a code. 
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the code; therefore this bound applies to all block codes. For the q - ary block codes the 
Hamming bound yields 


1 ~ R c > ~ log,, 

n 1 


s, 


(7.7-11) 


In Problem 7.39 it is shown that for large n the right-hand side of Equation 7.7-9 
can be approximated by 

^ « 2 nHb ( «) (7.7-12) 


where ///,(•) is the binary entropy function defined in Equation 6.2-6. Using this 
approximation, and Equation 7.7-6, we see that the asymptotic form of the Hamming 
bound for binary codes becomes 

H h (0 < 1 - Rc (7.7-13) 

The Hamming bound is tight for high-rate codes. 

As discussed before, a code satisfying the Hamming bound given by Equa- 
tion 7.7-10 with equality is called a perfect code. It has been shown by Tietavainen 
(1973) that the only binary perfect codes! are repetition codes with odd length, Ham- 
ming codes, and the (23, 12) Golay code with minimum distance 7. There exists only 
one nonbinary perfect code which is the (11,6) ternary Golay code with minimum 
distance 5. 


7.7-3 Plotkin Bound 


The Plotkin bound due to Plotkin (1960) states that for any r/-ary block code we have 

(7.7-14) 


rimm , q k ~ q k 1 


< 

~ q k - 1 


For binary codes this bound becomes 


4m in 5 


n2‘ 


k - 1 


2 k - 1 


(7.7-15) 


The proof of the Plotkin bound for binary linear block codes is given in Prob- 
lem 7.40. The proof is based on noting that the minimum distance of a code cannot 
exceed its average codeword weight. 

The form of the Plotkin bound given in Equation 7.7-15 is effective for low rates. 
Another version of the Plotkin bound, given in Equation 7.7-16 for binary codes, is 
tighter for higher-rate codes: 


2 ;_1 

rimin < min (n - k + j) — - 

l<j<k 2 J — 1 


(7.7-16) 


tHere again an (n, 1) code can be considered as a trivial perfect code. 
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A simplified version of this bound, obtained by choosing j = 1 + [log 2 d m ; n J , results in 

2 d min - 2 - Llog 2 d mm \ < n — k (7.7-17) 

The asymptotic form of this bound with the assumption of 8 < is 


8 < 



R c ) 


(7.7-18) 


7.7- 4 Elias Bound 

The asymptotic form of the Elias bound (see Berlekamp (1968)) states that for any 
binary code with <5 < ^ we have 

H b Q (l - Vi -2^)) < 1 -R c (7.7-19) 

The Elias bound also applies to nonbinary codes. For nonbinary codes this bound states 
that for any <7 -ary code with 8 < 1 — - we have 

<7 - 7 - 20) 

where H q (-) is defined by 

H q (p) = -P log, p-(l-p) log,(l -p) + p log q (q - 1) (7.7-21) 

for 0 < p < 1 . 

7.7- 5 McEliece-Rodemich-Rumsey- Welch (MRRW) Bound 

The McEliece-Rodemich-Rumsey-Welch (MRRW) bound derived by McEliece et al. 
(1977) is the tightest known bound for low to moderate rates. This bound has two 
forms; the simpler form has the asymptotic form given by 

Rc < Hb Q - 75(1-5)^ (7.7-22) 

for binary codes and for 8 <1 This bound is derived based on linear programming 
techniques. 


7.7-6 Varshamov-Gilbert Bound 

All bounds stated so far give the necessary conditions that must be stratified by the 
three main parameters n,k, and d of a block code. The Varshamov-Gilbert bound due to 
Gilbert (1952) and Varshamov (1957) gives the sufficient conditions for the existence 
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of an (n, k ) code with minimum distance d mm . The Varshamov -Gilbert bound in fact 
goes further to prove the existence of a linear block code with the given parameters. 
The Varshamov-Gilbert states that if the inequality 

\) i<qn - k (7.7-23) 

is satisfied, then there exists a r/ -ary (n . k ) linear block code with minimum distance 
dmin > d. For the binary case the Varshamov-Gilbert bound becomes 

(” T X ) < 2" - * (7.7-24) 

The asymptotic version of the Varshamov-Gilbert bound states that if for 0 < S < 

1 — - we have 
q 


H q {S) < 1 -R c (7.7-25) 

where H q (-) is given by Equation 7.7-21, then there exists a c/-ary (/;, R c ri) linear block 
code with minimum distance of at least 8n. 

A comparison of the asymptotic version of the bounds discussed above is shown in 
Figure 7.7-1 for the binary codes. As seen in the figure, the tightest asymptotic upper 
bounds are the Elias and the MRRW bounds. We add here that there exists a second 



FIGURE 7.7-1 

Comparison of Asymptotic Bounds. 
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version of the MRRW bound that is better than the Elias bound at higher rates. The 
ordering of the bounds shown on this plot is only an indication of how these bounds 
compare as n —> oo. The region between the tightest upper bound and the Varshamov- 
Gilbert lower bound can still be a rather wide region for certain block lengths. For 
instance, for a (127, 33) code the best upper bound and lower bound yield t/ mm = 48 
and d min = 32, respectively (Verhoeff (1987)). 


7.8 

MODIFIED LINEAR BLOCK CODES 

In many cases design techniques for linear block codes result in codes with certain 
parameters that might not be the exact parameters that are required for a certain appli- 
cation. For example, we have seen that for Hamming codes n = 2 m — 1 and d m j n = 3. 
In Section 7.10, we will see that the codeword lengths of BCH codes, which are widely 
used block codes, are equal to 2'" — 1. Therefore, in many cases in order to change 
the parameters of a code, the code has to be modified. In this section we study main 
methods for modification of linear block codes. 


7.8-1 Shortening and Lengthening 

Let us assume C is an (n , k) linear block code with minimum distance d mm . Shortening of 
C means choosing some 1 < j < k and considering only 2 k 7 information sequences 
whose leading j bits are zero. Since these components carry no information, they 
can be deleted. The result is a shortened code. The resulting code is a systematic 
(72 — j, k — j ) linear block code with rate R c = ^4 which is less than the rate of 
the original code. Since the codewords of a shortened code are the result of removing 
j zeros for the codewords of C, the minimum weight of the shortened code is at 
least as large as the minimum weight of the original code. If j is large, the minimum 
weight of the shortened code is usually larger than the minimum weight of the original 
code. 

example 7.8-1. A (15, 11) Hamming code can be shortened by 3 bits to obtain a 
(12, 8) shortened Hamming code which is 8 bits (1 byte) of information. The (15, 11) 
can also be shortened by 7 bits to obtain an (8, 4) shortened Hamming code with parity 
check matrix 


H = 


'0 1 
1 0 
1 1 
1 1 


1 1 
1 1 
0 1 
1 0 


1 0 
0 1 
0 0 
0 0 


0 O' 
0 0 

1 0 
0 1 


(7.8-1) 


This code has a minimum distance of 4. 
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example 7.8-2. Consider an (8, 4) linear block code with generator and parity check 
matrices given by 


G = 


H = 


1 1 
0 1 
0 0 
0 0 
1 1 
0 0 
0 0 
0 1 


1 1 
0 1 
1 0 
0 1 

1 1 
0 1 
1 0 
0 0 


1 1 
1 1 
1 1 
0 1 
1 1 
0 1 
1 1 
1 0 


i r 
0 0 
1 0 
1 1 

i r 

l l 

1 o 

i i 


(7.8-2) 


Shortening this code by 1 bit results in a (7, 3) linear block code with the following 
generator and parity check matrices. 


G = 


H = 


'1 

0 

0 

'1 

0 

0 

1 


0 

1 

0 

1 

0 

1 

0 


1110 0' 
0 1110 
10 111 

11111" 
10 111 
0 1110 
0 10 11 


(7.8-3) 


Both codes have a minimum distance of 4. 


Shortened codes are used in a variety of applications. One example is the shortened 
Reed-Solomon codes used in CD recording where a (255, 251) Reed-Solomon code is 
shortened to a (32, 28) code. 

Lengthening a code is the inverse of the shortening operation. Here j extra infor- 
mation bits are added to the code to obtain an (n + j, k + j) linear block code. The 
rate of the lengthened code is higher than that of the original code, and its minimum 
distance cannot exceed the minimum distance of the original code. Obviously in the 
process of shortening and lengthening, the number of parity check bits of a code does 
not change. In Example 7.8-2 the (8, 4) code can be considered a lengthened version 
of the (7, 3) code. 


7.8-2 Puncturing and Extending 

Puncturing is a popular technique to increase the rate of a low-rate code. In puncturing 
an ( n , k) code the number of information bits k remains unchanged whereas some 
components of the code are deleted (punctured). The result is an ( n — j,k) linear block 
code with higher rate and possibly lower minimum distance. Obviously the minimum 
distance of a punctured code cannot be higher than the minimum distance of the original 
code. 
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example 7.8-3. The (8, 4) code of Example 7.8-2 can be punctured to obtain a (7, 4) 
code with 


G = 


H = 


'1 

0 

0 

0 

'0 

0 

1 


1 

1 

0 

0 

0 

1 

0 


0 1 
1 0 
1 1 
0 1 
1 0 
0 1 
0 1 


0 

1 

0 

1 

1 

1 

0 


0 O' 
0 0 

1 0 

0 1 . 

i r 

1 0 
1 1 


(7.8-4) 


The reverse of puncturing is extending a code. In extending a code, while k remains 
fixed, more parity check bits are added. The rate of the resulting code is lower, and the 
resulting minimum distance is at least as large as that of the original code. 

example 7.8-4. A (7, 4) Hamming code can be extended by adding an overall parity 
check bit. The resulting code is an (8, 4) extended Hamming code whose parity check 
matrix has a row of all 1 s to check the overall parity. If the parity check matrix of the 
original Hamming code is an (n — k) x n matrix H , the parity check matrix of the 
extended Hamming code is given by 


H 

0 ' 

1 

1 


(7.8-5) 


where 1 denotes alxn row vector of 1 s and 0 denotes a (n — k) x 1 vector column 
of Os. 


7.8-3 Expurgation and Augmentation 

In these two modifications of a code, the block length n remains unchanged, and 
the number of information sequence k is decreased in expurgation and increased in 
augmentation. 

The result of expurgation of an (n. k) linear block code is an (n,k — j) code with 
lower rate whose minimum distance is guaranteed to be at least equal to the minimum 
distance of the original code. This can be done by eliminating j rows of the generator 
matrix G. The process of augmentation is the reverse of expurgation in which 2 i(n, k) 
codes are merged to generate an (n, k + j ) code. 


■ 7.9 

CYCLIC CODES 

Cyclic codes are an important class of linear block codes. Additional structure built in the 
cyclic code family makes their algebraic decoding at reduced computational complexity 
possible. The important class of BCH codes and Reed-Solomon (RS) codes belongs to 
the class of cyclic codes. Cyclic codes were first introduced by Prange (1957). 
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7.9-1 Cyclic Codes — Definition and Basic Properties 


Cyclic codes are a subset of the class of linear block codes that satisfy the following 
cyclic shift property: if c = (c„_i c „_ 2 • • • c 1 co) is a codeword of a cyclic code, then 
(c „_ 2 c „_ 3 • • • co c„_i), obtained by a cyclic shift of the elements of c, is also a codeword. 
That is, all cyclic shifts of c are codewords. As a consequence of the cyclic property, 
the codes possess a considerable amount of structure which can be exploited in the 
encoding and decoding operations. A number of efficient encoding and hard decision 
decoding algorithms have been devised for cyclic codes that make it possible to imple- 
ment long block codes with a large number of codewords in practical communication 
systems. Our primary objective is to briefly describe a number of characteristics of 
cyclic codes, with emphasis on two important classes of cyclic codes, the BCH and 
Reed-Solomon codes. 

In dealing with cyclic codes, it is convenient to associate with a codeword c = 
(c„- 1 c „-2 ■ ■ ■ c 1 Co) a polynomial c(X) of degree at most n — 1, defined as 

c(X) = Cn-iX"^ 1 + c n - 2 X n ~ 2 + • • • + ciX + co (7.9-1) 

For a binary code, each of the coefficients of the polynomial is either 0 or 1 . 

Now suppose we form the polynomial 

xc(x) = c n .,x n + c „_ 2 r-' + • • • + Cl Z 2 + c 0 X 


This polynomial cannot represent a codeword, since its degree may be equal to n (when 
c n - 1 = 1). However, if we divide Xc(X) by X n + 1, we obtain 


Xc(X) _ , c< 1 } (X) 

X n + 1 Cn ~ l + X" + 1 


(7.9-2) 


where 


c (1) (X) — c„_ 2 X n 1 + c„_ 2 X" “ + ••■+ CqX + C n —i 

Note that the polynomial c (1) (2f) represents the codeword c (1) = (c „_ 2 • • • co c„_ 1 ), 
which is just the codeword c shifted cyclicly by one position. Since c (l> (X) is the 
remainder obtained by dividing Xc(X) by X" + 1, we say that 

c (1) (X) = Xc(X) mod (. X n + 1) (7.9-3) 

In a similar manner, if c(X) represents a codeword in a cyclic code, then X'c(X) 
mod (X n + 1) is also a codeword of the cyclic code. Thus we may write 

X‘c(X) = Q(X)(X n + 1) + c (n (X) (7.9-4) 

where the remainder polynomial c {l \X) represents a codeword of the cyclic code, 
corresponding to i cyclic shifts of c to the right, and Q(X) is the quotient. 

We can generate a cyclic code by using a generator polynomial g(2f) of degree 
n — k. The generator polynomial of an (n, k) cyclic code is a factor of X n + 1 and has 
the general form 


g{X) = x n - k + g n ^ x x n - k ~ l 


+ ■ ■ ■ + giX + 1 


(7.9-5) 
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We also define a message polynomial u(X) 

u{X) = u k ^X k ~ l + u k _ 2 X k ~ 2 + ■ • • + u x X + n 0 (7.9-6) 

where (u k -\ u k - 2 • • • u\, no) represent the k information bits. Clearly, the product 
u(X)g(X) is a polynomial of degree less than or equal to n — 1, which may repre- 
sent a codeword. We note that there are 2 k polynomials {u, (X) }, and hence there are 2 k 
possible codewords that can be formed from a given g(X). 

Suppose we denote these codewords as 

c m (X) = u m (X)g(X), m = 1,2, . . . ,2 k (7.9-7) 

To show that the codewords in Equation 7.9-7 satisfy the cyclic property, consider any 
codeword c(X) in Equation 7.9-7. A cyclic shift of c( X ) produces 

c (1) (X) = Xc(X) + Cn- i(X" + 1) (7.9-8) 

and since g(X) divides both X" + 1 and c(X ), it also divides c (1) (X); i.e., c (1) (X) can 
be represented as 

c (1) (X) = n 1 (X)g(X) 

Therefore, a cyclic shift of any codeword c( X ) generated by Equation 7.9-7 yields 
another codeword. 

From the above, we see that codewords possessing the cyclic property can be 
generated by multiplying the 2 k message polynomials with a unique polynomial g(X), 
called the generator polynomial of the (n, k) cyclic code, which divides X n + 1 and has 
degree n — k. The cyclic code generated in this manner is a subspace S, of the vector 
space S. The dimension of S, is k. 

It is clear from above that an (n, k) cyclic code can exist only if we can find 
a polynomial g(X) of degree n — k that divides X" + 1. Therefore the problem of 
designing cyclic codes is equivalent to the problem of finding factors of X n + 1 . We 
have studied this problem for the case where n = 2 m — 1 for some positive integer 
m in the discussion following Equation 7.1-18, and we have seen that for this case 
the factors of X n + 1 are the minimal polynomials corresponding to the conjugacy 
classes of nonzero elements of GF(2'"). For general n, the study of the factorization 
of X n + 1 is more involved. The interested reader is referred to the book by Wicker 
(1995). Table 7.9-1 presents factoring of X n + 1. The representation in this table is in 
octal form; therefore the polynomial X 2 + X 2 + 1 is represented as 001101 which is 
equivalent to 15 in octal notation. 

example 7.9-1. Consider a code with block length n — 7. The polynomial X 1 + 1 

has the following factors: 

X 1 + 1 = (X + 1)(X 3 + X 2 + 1)(X 3 + X+1) (7.9-9) 

To generate a (7,4) cyclic code, we may take as a generator polynomial one of the 

following two polynomials: 


gl (X) = X 3 + X 2 + 1 
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■ TABLE 7.9-1 

Factors of X" + 1 Based on MacWilliams and Sloane (1977) 


n Factors 


7 3 . 15.13 

9 3 . 7.111 

15 3 . 7 . 31 . 23.37 

17 3 . 471.727 

21 3 . 7 . 15 . 13 . 165.127 

23 3 . 6165.5343 

25 3 . 37.4102041 

27 3 . 7 . 111.1001001 

31 3 . 51 . 45 . 75 . 73 . 67.57 

33 3 . 7 . 2251 . 3043.3777 

35 3 . 15 . 13 . 37 . 16475.13627 

39 3 . 7 . 17075 . 13617.17777 

41 3 . 5747175.6647133 

43 3 . 47771 . 52225.64213 

45 3 . 7 . 31 . 23 . 27 . 111 . 11001.10011 

47 3 . 75667061.43073357 

49 3 . 15 . 13 . 10040001.10000201 

51 3 . 7 . 661 . 471 . 763 . 433 . 727.637 

55 3 . 37 . 3777 . 7164555.5551347 

57 3 . 7 . 1341035 . 1735357.1777777 

63 3 . 7 . 15 . 13 . 141 . 111 . 165 . 155 . 103 . 163 . 133 . 147.127 

127 3 . 301 . 221 . 361 . 211 . 271 . 345 . 325 . 235 . 375 . 203 . 323 . 313 . 253 . 247 . 367 . 217 . 357.277 


and 


gi(X) = X 3 + X + 1 


The codes generated by g\(X) and g 2 (X) are equivalent. The codewords in the (7, 4) 
code generated by g\(X) = X 3, + X 2 + 1 are given in Table 7.9-2. 


example 7.9-2. To determine the possible values of k for a cyclic code with block 
length n — 25, we use Table 7.9-1. From this table, factors of X 25 + 1 are 3, 37, and 
4102041 which correspond to X+l,X 4 +X i +X 2 +X+l, and X 20 +X 15 +X 10 +X 5 + 1. 
The possible (nontrivial) values for n — k are 1, 4, 20, and 5, 21, 24, where the latter 
three are obtained by multiplying pairs of the polynomials. These correspond to the 
values 24, 21, 20, 5, 4, and 1 for k. 


In general, the polynomial X' 1 + 1 may be factored as 


nhin chung, da thuc 
+! 


x n + 1 = g(X)h(X) 


where g(X) denotes the generator polynomial for the (n , k) cyclic code and h( X) denotes 
the parity check polynomial that has degree k. The latter may be used to generate the 
dual code. For this purpose, we define the reciprocal polynomial of h(X) as 


X k h(X~ l ) = X k (X~ k + h k -i X- k+l + h k - 2 X~ k+2 + • • • + h\X~ l + 1) 
= 1 + hk-\X + hk—iX~ + • • • + h\X k 1 + X k 


(7.9-10) 
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TABLE 7.9-2 

The (7, 4) Cyclic Code with Generator Polynomial 
gi(X) = X 3 + X 2 + 1 


Information Bits Codewords 


X 3 

X 2 

X 1 

x° 

X 6 

X 5 

X 4 

X 3 

X 2 

X 1 

x° 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

1 

0 

0 

1 

0 

1 

1 

1 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

0 

1 

0 

1 

1 

1 

0 

0 

1 

0 

1 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

1 

0 

1 

0 

0 

0 

1 

1 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

1 

0 

0 

1 

1 

1 

0 

0 

1 

0 

1 

1 

0 

1 

0 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

0 

1 

1 

0 

1 

0 

0 

0 

1 

1 

1 

1 

0 

1 

0 

0 

0 

1 

1 

0 

1 

1 

1 

1 

1 

0 

0 

1 

0 

1 

1 


Clearly, the reciprocal polynomial is also a factor of X" + 1. Hence, X k h(X ') is 
the generator polynomial of an («, n — k) cyclic code. This cyclic code is the dual code 
to the (n, k) code generated from g(X). Thus, the (n , n — k) dual code constitutes the 
null space of the («, k) cyclic code. 

example 7.9-3. Let us consider the dual code to the (7, 4) cyclic code generated 
in Example 7.9-1. This dual code is a (7, 3) cyclic code associated with the parity 
polynomial 


/7 1 (Z) = (Z+1)(Z 3 + Z+1) 
= X 4 + X 3 + X 2 + 1 


(7.9-11) 


The reciprocal polynomial is 

X 4 /2i(X -1 ) = 1 + X + X 2 + X 4 

This polynomial generates the (7, 3) dual code given in Table 7.9-3. The reader can 
verify that the codewords in the (7, 3) dual code are orthogonal to the codewords in the 
(7, 4) cyclic code of Example 7.9-1. Note that neither the (7, 4) nor the (7, 3) codes 
are systematic. 

It is desirable to show how a generator matrix can be obtained from the genera- 
tor polynomial of a cyclic ( n , k) code. As previously indicated, the generator matrix 
for an ( n , k) code can be constructed from any set of k linearly independent code- 
words. Hence, given the generator polynomial g(X), an easily generated set of k lin- 
early independent codewords is the codewords corresponding to the set of k linearly 
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TABLE 7.9-3 

The (7, 3) Dual Code with Generator Polynomial 
X 4 hi (X- 1 ) = X 4 + X 2 + X + 1 


Information Bits Codewords 


X 2 

X 1 

x° 

X 6 

X 5 

X 4 

X 3 

X 2 

X 1 

x° 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

1 

1 

1 

0 

1 

0 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

0 

1 

1 

1 

0 

0 

1 

1 

0 

0 

1 

0 

0 

1 

1 

0 

0 

1 

0 

1 

1 

0 

1 

1 

0 

1 

1 

1 

1 

0 

1 

1 

0 

0 

0 

1 

0 

1 

1 

1 

1 

1 

1 

0 

1 

0 

1 


independent polynomials 

X k ~ l g{X), X k ~ 2 g(X), Xg(X), g(X) 

Since any polynomial of degree less than or equal to n — 1 and divisible by g(X) 
can be expressed as a linear combination of this set of polynomials, the set forms a 
basis of dimension k. Consequently, the codewords associated with these polynomials 
form a basis of dimension k for the ( n , k) cyclic code. 


example 7.9-4. The four rows of the generator matrix for the (7, 4) cyclic code with 
generator polynomial gi(X) = X 3 + X 2 + 1 are obtained from the polynomials 

X'gi(X) = X 3+i + X 2+i + X\ i = 3, 2, 1, 0 


It is easy to see that the generator matrix is 

'1 1 0 1 0 0 0 ' 

0110100 
Gl ~ 0 0 1 1 0 1 0 

0 0 0 1 1 0 1 


(7.9-12) 


Similarly, the generator matrix for the (7, 4) cyclic code generated by the polynomial 
g2 (X) = X 3 + X + 1 is 


'1 0 1 1 0 0 0 ' 
0 10 110 0 
0 0 10 110 
0 0 0 1 0 1 1 


(7.9-13) 


The parity check matrices corresponding to G\ and G 2 can be constructed in the same 
manner by using the respective reciprocal polynomials (see Problem 7.46). 

da thuc nghich dao 

Shortened Cyclic Codes 

From Example 7.9-2 and Table 7.9-1 it is clear that we cannot design cyclic (n, k) 
codes for all values of n and k. One common approach to designing cyclic codes with 
given parameters is to begin with the design of an ( [n , k) cyclic code and then shorten it 
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by j bits to obtain an ( n — j, k — j) code. The shortening of the cyclic code is carried out 
by equating the j leading bits of the information sequence to zero and not transmitting 
them. The resulting codes are called shortened cyclic codes, although in general they 
are not cyclic codes. Of course by adding the deleted j zero bits at the receiver, we can 
decode these codes with any decoder designed for the original cyclic code. 

Shortened cyclic codes are extensively used in the form of shortened Reed-Solomon 
codes and cyclic redundancy check (CRC) codes, which are widely used for error 
detection in computer communication networks. For more details on CRC codes, see 
Castagnoli et al. (1990) and Castagnoli et al. (1993). 


7.9-2 Systematic Cyclic Codes 


Note that the generator matrix obtained by this construction is not in systematic form. 
We can construct the generator matrix of a cyclic code in the systematic form 


G = 



from the generator polynomial as follows. First, we observe that the /th row of G 
corresponds to a polynomial of the form X n ~' + Ri(X), l = 1,2, ... ,k, where Ri(X) 
is a polynomial of degree less than n — k. This form can be obtained by dividing X" / 
by g(X). Thus, we have 

1 = 1,2, ...,k 


l = 1,2, ... ,k (7.9-14) 

where Qi(X ) is the quotient. But X" / + R/(X ) is a codeword of the cyclic code since 
X" / + R)(X) = Qi(X)g(X). Therefore the desired polynomial corresponding to the 
/th row of G is X n ~ l + R,(X). 

example 7.9-5. For the (7,4) cyclic code with generator polynomial g2 ( X ) = X 3 + 
X + 1, previously discussed in Example 7.9-4, we have 

X 6 = (X 3 + X + l)g 2 (X) + X 2 + 1 

X 5 = (X 2 + 1 ) g2 (X) + X 2 + X + 1 

X 4 = Xg 2 (X) + X 2 + X 

X 3 = g 2 (X) + X + 1 

Hence, the generator matrix of the code in systematic form is 

'loooior 

0 1 0 0 1 1 1 

Gl - 0 0 1 0 1 1 0 

0 0 0 1 0 1 1 


X n ~ l R,(X) 

= Qi(X ) + 


g(X) 


g(X) ’ 


or, equivalently, 


X n ~ l = Q/(X)g(X) + R/(X), 


(7.9-15) 
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and the corresponding parity check matrix is 


H 2 


'1110 
0 111 
110 1 


1 0 O' 
0 1 0 
0 0 1 


(7.9-16) 


It is left as an exercise for the reader to demonstrate that the generator matrix G 2 given 
by Equation 7.9-13 and the systematic form given by Equation 7.9-15 generate the 
same set of codewords (see Problem 7.16). 

The method for constructing the generator matrix G in systematic form according 
to Equation 7.9-14 also implies that a systematic code can be generated directly from 
the generator polynomial g(X ). Suppose that we multiply the message polynomial u(X) 
by X n ~ k . Thus, we obtain 

X n ~ k u(X) = n*_ iX”- 1 + u k - 2 X n ~ 2 + ■ ■ ■ + uiX n - k+l + u 0 X n ~ k 


In a systematic code, this polynomial represents the first k bits in the codeword c(X). 
To this polynomial we must add a polynomial of degree less than n — k representing 
the parity check bits. Now, if X n ~ k u(X) is divided by g(X). the result is 


X n - k u(X ) 
g(X) 


= Q(X) + 


r(X) 

g(X) 


or, equivalently. 


X n - k u(X) = Q(X)g(X) + r(X) 


(7.9-17) 


where r(X) has degree less than n — k. Clearly, Q(X)g(X) is a codeword of the cyclic 
code. Hence, by adding (modulo-2) r(X) to both sides of Equation 7.9-17, we obtain 
the desired systematic code. 

To summarize, the systematic code may be generated by 

1. Multiplying the message polynomial u(X ) by X" k 

2. Dividing X" k u(X) by g(X) to obtain the remainder r(X) 

3. Adding r(X) to X n ~ k u(X ) 

Below we demonstrate how these computations can be performed by using shift 
registers with feedback. 

Since X" + 1 = g(X)h(X) or, equivalently, g(X)h(X) = 0 mod (X n + 1), we 
say that the polynomials g(X) and h(X) are orthogonal. Furthermore, the polynomials 
X 1 g(X) and X J h(X) are also orthogonal for all i and j . However, the vectors corre- 
sponding to the polynomials g(X) and h(X) are orthogonal only if the ordered elements 
of one of these vectors are reversed. The same statement applies to the vectors corre- 
sponding to X' g(X) and X J h(X). In fact, if the parity polynomial h(X) is used as a 
generator for the ( n , n — k) dual code, the set of codewords obtained just comprises the 
same codewords generated by the reciprocal polynomial except that the code vectors 
are reversed. This implies that the generator matrix for the dual code obtained from 
the reciprocal polynomial X k h(X~ x ) can also be obtained indirectly from h(X). Since 
the parity check matrix H for the (n , k) cyclic code is the generator matrix for the 
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dual code, it follows that H can also be obtained from h(X). The following example 
illustrates these relationships. 


example 7.9-6. The duaj^code to the (7, 4) cyclic code generated by g\(X) = X 3 + 
X 2 + 1 is the (7,3) dual code that is generated by the reciprocal polynomial X 4 h i ( X ~ 1 ) = 
X 4 + X 2 + X + 1 • However, we may also use h i (X) to obtain the generator matrix for the 
dual code. Then the matrix corresponding to the polynomials X'h\{X), i = 2, 1, 0, is 


Gj,\ 


T 1 10 10 0" 
0 1110 10 
0 0 1110 1 


The generator matrix for the (7, 3) dual code, which is the parity check matrix for the 
(7, 4) cyclic code, consists of the rows of G/,i taken in reverse order. Thus, 


H i = 


'0 

0 

1 


0 

1 

0 


1 

0 

1 


0 

1 

1 


1 

1 

1 


i r 
1 o 
o o 


The reader may verify that G\H\ = 0. Note that the column vectors of H\ consist 
of all seven binary vectors of length 3, except the all-zero vector. But this is just the 
description of the parity check matrix for a (7, 4) Hamming code. Therefore, the (7, 4) 
cyclic code is equivalent to the (7, 4) Hamming code. 


7.9-3 Encoders for Cyclic Codes 

The encoding operations for generating a cyclic code may be performed by a linear 
feedback shift register based on the use of either the generator polynomial or the parity 
polynomial. First, let us consider the use of g(X). 

As indicated above, the generation of a systematic cyclic code involves three steps, 
namely, multiplying the message polynomial u(X) by X" k , dividing the product by 
g(X), and adding the remainder to X" k u(X). Of these three steps, only the division is 
nontrivial. 

The division of the polynomial A(X) = X n ~ k u(X) of degree n — 1 by the 
polynomial 

gO 0 = gn-kX n k + g n -k~\X n k 1 + ■ • • + g\X + go 
may be accomplished by the (n — k)-stage feedback shift register illustrated in Fig- 
ure 7.9-1. Initially, the shift register contains all zeros. The coefficients of A(X) are 
clocked into the shift register one (bit) coefficient at a time, beginning with the higher- 
order coefficients, i.e., with n„_i, followed by a n - 2 , and so on. After the kth shift, the 
first nonzero output of the quotient is qi- - 1 = g„-kU n - 1 ■ Subsequent outputs are gener- 
ated as illustrated in Figure 7.9-1 . For each output coefficient in the quotient, we must 
subtract the polynomial g(X) multiplied by that coefficient, as in ordinary long division. 
The subtraction is performed by means of the feedback part of the shift register. Thus, 
the feedback shift register in Figure 7.9-1 performs division of two polynomials. 

In our case, g n -k = go = U and for binary codes the arithmetic operations are 
performed in modulo-2 arithmetic. Consequently, the subtraction operations reduce to 
modulo-2 addition. Furthermore, we are interested only in generating the parity check 
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FIGURE 7.9-1 

A feedback shift register for dividing the polynomial A(X) by g(X). 




FIGURE 7.9-2 

Encoding a cyclic code by use of the generator polynomial g(X). 


bits for each codeword, since the code is systematic. Consequently, the encoder for the 
cyclic code takes the form illustrated in Figure 7.9-2. The first k bits at the output of the 
encoder are simply the k information bits. These k bits are also clocked simultaneously 
into the shift register, since switch 1 is in the closed position. Note that the polynomial 
multiplication of X n ~ k with u(X) is not performed explicitly. After the k information 
bits are all clocked into the encoder, the positions of the two switches are reversed. 
At this time, the contents of the shift register are simply the n — k parity check bits, 
which correspond to the coefficients of the remainder polynomial. These n — k bits are 
clocked out one at a time and sent to the modulator. 

example 7.9-7. The shift register for encoding the (7, 4) cyclic code with generator 
polynomial g(X) = X 3 + X + 1 is illustrated in Figure 7.9-3. Suppose the input 
message bits are 01 10. The contents of the shift register are as follows: 


Input 

Shift 

Shift Register Contents 


0 

000 

0 

1 

000 

1 

2 

110 

1 

3 

101 

0 

4 

100 
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FIGURE 7.9-3 

The encoder for the (7, 4) cyclic code 
with generator polynomial 
g(X) = X 3 + X+l. 


Hence, the three parity check bits are 100, which correspond to the code bits C5 = 0, 
c 6 = 0, and C7 = 1 . 

Instead of using the generator polynomial, we may implement the encoder for the 
cyclic code by making use of the parity polynomial 


h{X) = X k + h k - iX k ~ l H b h x X + 1 


The encoder is shown in Figure 7.9 — 4. Initially, the k information bits are shifted 
into the shift register and simultaneously fed to the modulator. After all k information 
bits are in the shift register, the switch is thrown into position 2 and the shift regis- 
ter is clocked n — k times to generate the n — k parity check bits, as illustrated in 
Figure 7.9-4. 

example 7.9-8. The parity polynomial for the (7, 4) cyclic code generated by g(X) = 
X 3 + X + 1 is h(X) = X 4 + X 2 + X + 1 . The encoder for this code based on the parity 
polynomial is illustrated in Figure 7.9-5. If the input to the encoder is the message 
bits 0110, the parity check bits are C5 = 0, cy, = 0, and C7 = 1, as is easily verified. 
Note that the encoder based on the generator polynomial is simpler when n — k < k 
(k > |), i.e., for high-rate codes ( R c > j), while the encoder based on the parity 
polynomial is simpler when k < n — k (k < |) , which corresponds to low- rate codes 

{R c <\). 



FIGURE 7.9-4 

The encoder for an ( n , k ) cyclic code based on the parity polynomial h(X). 
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FIGURE 7.9-5 

The encoder for the (7, 4) cyclic code based on the parity polynomial 
h(X) = X 4 + X 2 + X+ 1. 

7.9-4 Decoding Cyclic Codes 

Syndrome decoding, described in Section 7.5, can be used for the decoding of cyclic 
codes. The cyclic structure of these codes makes it possible to implement syndrome 
computation and the decoding process using shift registers with considerable less com- 
plexity compared to the general class of linear block codes. 

Let us assume that c is the transmitted codeword of a binary cyclic code and 
y = c + e is the received sequence at the output of the binary symmetric channel model 
(i.e., the channel output after the matched filter outputs have been passed through a 
binary quantizer). In terms of the corresponding polynomials, we can write 

y(X) = c(X) + e(X) (7.9-18) 

and since c( X ) is a codeword, it is a multiple of g(X), the generator polynomial of the 
code; i.e., c(X ) = ii(X)g(X) for some u(X), a polynomial of degree at most k — 1. 

yiX) = u(X)g(X) + e(X) (7.9-19) 

From this relation we conclude 

y(X) mod g(X) = e(X) mod g(X) (7.9-20) 

Let us define s(X) = y(X) mod g(X) to denote the remainder of dividing y(A') by 
g(X) and call s(X ) the syndrome polynomial, which is a polynomial of degree at most 
n — k — 1 . 

To compute the syndrome polynomial, we need to divide y(7f) by the generator 
polynomial g(X) and find the remainder. Clearly s(X) depends on the error pattern 
and not on the codeword, and different error patterns can yield the same syndrome 
polynomials since the number of possible syndrome polynomials is 2 n ~ k and the number 
of possible error patterns is 2". Maximum-likelihood decoding calls for finding the error 
pattern of the lowest weight corresponding to the computed syndrome polynomial s(X) 
and adding it to y(X) to obtain the most likely transmitted codeword polynomial c(X). 

The division of y(X ) by the generator polynomial g(X) may be carried out by means 
of a shift register which performs division as described previously. First the received 
vector y is shifted into an (n — k)-stage shift register as illustrated in Figure 7.9-6. 
Initially, all the shift register contents are zero, and the switch is closed in position 1 . 
After the entire « hit received vector has been shifted into the register, the contents 
of the n — k stages constitute the syndrome with the order of the bits numbered as 
shown in Figure 7.9-6. These bits may be clocked out by throwing the switch into 


Chapter Seven: Linear Block Codes 


459 




Output 

syndrome 

-►O ► 

2 


FIGURE 7.9-6 

An ( n — k )- stage shift register for computing the syndrome. 


position 2. Given the syndrome from the (n — k)- stage shift register, a table lookup may 
be performed to identify the most probable error vector. Note that if the code is used for 
error detection, a nonzero syndrome detects an error in transmission of the codeword. 

example 7.9-9. Let us consider the syndrome computation for the (7, 4) cyclic Ham- 
ming code generated by the polynomial g(X) = X :i + X + \ . Suppose that the received 
vector is y = (1001 101). This is fed into the three-stage register shown in Figure 7 .9-7 . 
After seven shifts, the contents of the shift register are 110, which corresponds to the 
syndrome s = (Oil). The most probable error vector corresponding to this syndrome 
is e = (0001000) and, hence, 

c — y e = (1000101) 

The information bits are 1 0 0 0. 

The table lookup decoding method using the syndrome is practical only when n—k 
is small, e.g., when n — k < 10. This method is impractical for many interesting and 
powerful codes. For example, if n — k = 20, the table has 2 20 (approximately 1 million) 


Input 

1011001 


o- 


Output 

- -►O — syndrome 


Shift 

Register contents 

0 

000 

1 

100 

2 

010 

3 

001 

4 

010 

5 

101 

6 

100 

7 

110 


FIGURE 7.9-7 

Syndrome computation for the (7, 4) cyclic code with generator polynomial 
g(X) = X 3 + X + 1 and received vector y = (1001 101). 
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entries. Such a large amount of storage and the time required to locate an entry in such a 
large table renders the table lookup decoding method impractical for long codes having 
large numbers of check bits. 

The cyclic structure of the code can be used to simplify finding the error polynomial. 
First we note that, as shown in Problem 7.54, if s(X) is the syndrome corresponding to 
error sequence e(X), then the syndrome corresponding to e (l> (X), the right cyclic shift 
of e(X ), is s (1) (Jf), defined by 


This means that to obtain the syndrome corresponding to y (1) , we need to multiply 
by X and then divide by g(3Q; but this is equivalent to shifting the content of the 
shift register shown in Figure 7.9-6 to the right when the input is disconnected. This 
means that the same combinatorial logic circuit that computes e n -\ from s can be used 
to compute e „_2 from a shifted version of s, i.e., The resulting decoder is known 
as the Meggit decoder (Meggitt (1961)). 

The Meggit decoder feeds the received sequence y into the syndrome computing 
circuit to compute s(2f); the syndrome is fed into a combinatorial circuit that computes 
e n -\. The output of this circuit is added modulo-2 to y n -i, and after correction and a 
cyclic shift of the syndrome, the same combinatorial logic circuit computes e„_ 2 . This 
process is repeated n times, and if the error pattern is correctable, i.e., is one of the 
coset leaders, the decoder is capable of correcting it. 

For details on the structure of decoders for general cyclic codes, the interested 
reader is referred to the texts of Peterson and Weldon ( 1972). Lin and Costello (2004), 
Blahut (2003), Wicker (1995), and Berlekamp (1968). 

7.9-5 Examples of Cyclic Codes 

In this section we discuss certain examples of cyclic codes. We have have selected the 
cyclic Hamming, Golay, and maximum-length codes discussed previously as general 
linear block codes. The most important class of cyclic codes, i.e., the BCH codes, is 
discussed in Section 7.10. 

Cyclic Hamming Codes 

The class of cyclic codes includes the cyclic Hamming codes, which have a block length 
n = 2'” — 1 and n — k = m parity check bits, where m is any positive integer. The cyclic 
Hamming codes are equivalent to the Hamming codes described in Section 7.3-2. 

Cyclic Golay Codes 

The linear (23, 12) Golay code described in Section 7.3-6 can be generated as a cyclic 
code by means of the generator polynomial 


s (1) (20 = Xs(X) mod g(X) 


(7.9-21) 


g(X) = X 11 + X 9 + X 7 + X 6 + X 5 + X + 1 


(7.9-22) 


The codewords have a minimum distance d m i„ = 7. 
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FIGURE 7.9-8 

Three-stage ( m = 3) shift register with 
feedback. 


Maximum-Length Shift Register Codes 

Maximum-length shift register codes are a class of cyclic codes equivalent to the 
maximum-length codes described in Section 7.3-3 as duals of Hamming codes. These 
are a class of cyclic codes with 


(n,k) = (2 m - 1, m) 


(7.9-23) 


where m is a positive integer. The codewords are usually generated by means of an 
m -stage digital shift register with feedback, based on the parity polynomial. For each 
codeword to be transmitted, the m information bits are loaded into the shift register, 
and the switch is thrown from position 1 to position 2. The contents of the shift register 
are shifted to the left one bit at a time for a total of 2'" — 1 shifts. This operation 
generates a systematic code with the desired output length n = 2'” — 1 . For example, 
the codewords generated by the m = 3 stage shift register in Figure 7.9-8 are listed in 
Table 7.9^1. 

Note that, with the exception of the all-zero codeword, all the codewords generated 
by the shift register are different cyclic shifts of a single codeword. The reason for this 
structure is easily seen from the state diagram of the shift register, which is illustrated 
in Figure 7.9-9 for m = 3. When the shift register is loaded initially and shifted 2 m — 1 
times, it will cycle through all possible 2 m — 1 states. Hence, the shift register is back 
to its original state in 2'" — 1 shifts. Consequently, the output sequence is periodic with 
length n = 2 m — 1 . Since there are 2 m — 1 possible states, this length corresponds to the 
largest possible period. This explains why the 2 m — 1 codewords are different cyclic 
shifts of a single codeword. Maximum-length shift register codes exist for any positive 


TABLE 7.9-4 

Maximum-Length Shift Register Code for m = 3 


Information Bits Codewords 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

1 

1 

1 

0 

1 

0 

1 

0 

0 

1 

0 

0 

1 

1 

1 

0 

1 

1 

0 

1 

1 

1 

0 

1 

0 

1 

0 

0 

1 

0 

0 

1 

1 

1 

0 

1 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

0 

1 

1 

0 

1 

0 

0 

1 

1 

1 

1 

1 

1 

1 

0 

1 

0 

0 
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FIGURE 7.9-9 

The seven states for the m = 3 maximum-length shift 
register. 



value of m. Table 7.9-5 lists the stages connected to the modulo-2 adder that result in 
a maximum-length shift register for 2 < m < 34. 

Another characteristic of the codewords in a maximum-length shift register code 
is that each codeword, with the exception of the all-zero codeword, contains 2 m_1 ones 


TABLE 7.9-5 

Shift-Register Connections for Generating Maximum-Length Sequences 
[from Forney (1970)]. 


m 

Stages Connected 
to Modulo-2 Adder 

m 

Stages Connected 
to Modulo-2 Adder 

m 

Stages Connected 
to Modulo-2 Adder 

2 

1,2 

13 

1,10,11,13 

24 

1,18,23,24 

3 

1,3 

14 

1,5,9,14 

25 

1,23 

4 

1,4 

15 

1,15 

26 

1,21,25,26 

5 

1.4 

16 

1,5,14,16 

27 

1,23,26,27 

6 

1,6 

17 

1,15 

28 

1,26 

7 

1,7 

18 

1,12 

29 

1,28 

8 

1,5, 6,7 

19 

1,15,18,19 

30 

1,8,29,30 

9 

1,6 

20 

1,18 

31 

1,29 

10 

1,8 

21 

1,20 

32 

1,11,31,32 

11 

1,10 

22 

1,22 

33 

1,21 

12 

1,7,9,12 

23 

1,19 

34 

1,8,33,34 
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and 2 m ' 1 — 1 zeros, as shown in Problem 7.23. Hence all these codewords have identical 
weights, namely, w = 2'" 1 . Since the code is linear, this weight is also the minimum 
distance of the code, i.e., 

p . _ o m ~ l 

As stated in Section 7.3-3, the maximum-length shift register code shown in Table 7.9-4 
is identical to the (7, 3) code given in Table 7.9-3, which is the dual of the (7, 4) 
Hamming code given in Table 7.9-2. The maximum-length shift register codes are the 
dual codes of the cyclic Hamming (2 m — 1, 2 m — 1 — m) codes. The shift register for 
generating the maximum-length code may also be used to generate a periodic binary 
sequence with period n = 2 m — 1 . The binary periodic sequence exhibits a periodic 
autocorrelation R(m) with values R(m) = n for m = 0, ±n, ±2 n, . . . , and Rim) = — 1 
for all other shifts as described in Section 12.2-4. This impulselike autocorrelation 
implies that the power spectrum is nearly white, and hence the sequence resembles 
white noise. As a consequence, maximum-length sequences are called pseudo-noise 
(PN) sequences and find use in the scrambling of data and in the generation of spread 
spectrum signals as discussed in Chapter 12. 


■ 7.10 

BOSE-CHAUDHURI-HOCQUENGHEM (BCH) CODES 

BCH codes comprise a large class of cyclic codes that include codes over both binary 
and nonbinary alphabets. BCH codes have rich algebraic structure that makes their 
decoding possible by using efficient algebraic decoding algorithms. In addition, BCH 
codes exist for a wide range of design parameters (rates and block lengths) and are well 
tabulated. It also turns out that BCH codes are among the best-known codes for low to 
moderate block lengths. 

Our study of BCH codes is rather brief, and the interested reader is referred to 
standard texts on coding theory including those by Wicker (1995), Lin and Costello 
(2004), Berlekamp (1968), and Peterson and Weldon (1972) for details and proofs. 


7.10-1 The Structure of BCH Codes 

BCH codes are a subclass of cyclic codes that were introduced independently by Bose 
Ray-Chaudhuri (1960a, 1960b) and Hocquenghem (1959). These codes have rich alge- 
braic structure that makes it possible to design efficient algebraic decoding algorithms 
for them. 

Since BCH codes are cyclic codes, we can describe them in terms of their genera- 
tor polynomial g(X). In this section we treat only a special class of binary BCH codes 
called primitive binary BCH codes. These codes have a block length of n = 2'" — 1 
for some integer m > 3, and they can be designed to have a guaranteed error de- 
tection capability of at least t errors for any t < 2 m ~ l . In fact for any two positive 
integers m > 3 and t < 2 m ~ l we can design a BCH code whose parameters satisfy the 
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following relations: 


n = 2 m - 1 

n — k < mt (7.10-1) 

dmm > 2r + 1 

The first equality determines the block length of the code. The second inequality pro- 
vides a bound on the number of parity check bits of the code, and the third inequality 
states that this code is capable of correcting at least t errors. The resulting code is called 
a t-e rror correcting BCH code; although it is possible that this code can correct more 
than t errors. 

The Generator Polynomial for BCH Codes 

To design a t-e rror correcting (primitive) BCH code, we choose ot, a primitive element 
of GF(2'"). Then g(X), the generator polynomial of the BCH code, is defined as the 
lowest-degree polynomial g(X) over GF(2) such that a, or, a 3 , . . . , and a 2t are its 
roots. 

Using the definition of the minimal polynomial of a field element given in Sec- 
tion 7.1-1 and by Equation 7.1-12, we know that any polynomial over GF(2) that has 
fi e GF(2) as a root is divisible by c/)p(X), the minimal polynomial of /?. Therefore 
g(2f) must be divisible by <fi a i(X ) for 1 < i < 2 1. Since g(X) is a polynomial of lowest 
degree with this property, we conclude that 

g(X) = LCM {faCX), 1 < / < 2 1} (7.10-2) 

where LCM denotes the least common multiple of cj) a i (X)’s. Also note that, for instance, 
the (j) a i{X ) for / = 1,2,4,... are the same since a, a 2 , a 4 , ... are conjugates and hence 

they have the same minimal polynomial. The same is true for a 3 , a 6 , a 12 , Therefore, 

in the expression for g(X) it is sufficient to consider only odd values of a, i.e., 

g(X) = LCM (0„(X), 0 O >(X), ^(X\ (7.10-3) 

and since the degree of c/) a i(X) does not exceed in, the degree of g(X) is at most mt. 
Therefore, n — k < mt. 

Let us assume that c(X) is a codeword polynomial of the designed BCH code. 
Lrom the cyclic property of the code we know that g(X) is a divisor of c( X ). Therefore, 
all a' for 1 < i < 2t are roots of c(X); i.e., for any codeword polynomial c(X) we 
have 


c(a i )= 0 \ <i <2t (7.10-4) 

The conditions given in Equation 7.10-4 are necessary and sufficient conditions for a 
polynomial of degree less than n to be a codeword polynomial of the BCH code. 

example 7 . 10 - 1 . To design a single-error-correcting ( t = 1) BCH code with block 
length n = 1 5 (m — 4), we choose a a primitive element in GF(2 4 ). The minimal 
polynomial of a is a primitive polynomial of degree 4. 
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From Table 7.1-5 we see that g(X ) — (/>,, ( X ) = X 4 + X + 1. Therefore, n — k = 
4 and k = 11. Since the weight of g(X) is 3, we have d mm > 3. Combining this 
with Equation 7.10-1, which states t/ m i n < 2t + 1 = 3, we conclude that c/ mm = 3. 
Therefore a single-error-correcting BCH code with block length 15 is a (15, 11) code 
with d m i n = 3. This is, in fact, a cyclic Hamming code. In general, cyclic Hamming 
codes are single-error-correcting BCH codes. 

example 7 . 10 - 2 . To design a four-error-correcting ( t = 4) BCH code with block 
length n = 15 (in = 4), we choose a a primitive element in GF(2 4 ). The minimal 
polynomial of a is g(X) = <p a ( X ) = X 4 + X + \ . We also need to find the minimal 
polynomials of a 3 , a 5 , and a 1 . 

From Example 7.1-5 we have </>„3 = X 4 + X 3 + X 2 + X + 1, <p a 5 = X 2 + X + 1, 
and (p a i(X ) = X 4 + X 4 + 1 . Therefore, 

g(X) = (X 4 + X + 1)(X 4 + X 3 + X 2 + X + 1) 


x (X 2 + X 4- 1)(X 4 + X 3 + 1) 


= X 


14 


x 13 + x 12 - 


+ X 6 + X 3 + X 4 


-X 1 

X 3 


FX 

X 2 


10 


+ X 4 
X+ 1 


X s 4- x' 


(7.10-5) 


Hence n — k = 14 and k = 1; the resulting code is a (15. 1) repetition code with 
d m [ n = 15. Note that this code was designed to correct four errors but it is capable of 
correcting up to seven errors. 

example 7 . 10 - 3 . To design a double-error-correcting BCH code with block length 
n = 15 (m = 4), we need the minimal polynomials of a and a 3 . The minimal poly- 
nomial of a is g(X) = <p a (X) = X 4 + X + 1, and from Example 7.1-5, (p a i = 
X 4 + X 3 + X 2 + X + 1. Therefore, 


g(X) = (X 4 4- X 4 1)(X 4 + X 3 + X 2 + X + 1) 


= X s + X' + x b 


X 4 


1 


(7.10-6) 


Hencen— k = 8and£ = 7, and the resulting code is a ( 1 5. 7) BCH code with c/ Tnm = 5. 

Table 7.10-1 lists the coefficients of generator polynomials for BCH codes of block 
lengths 7 < n < 255, corresponding to 3 < m < 8. The coefficients are given in octal 
form, with the leftmost digit corresponding to the highest-degree term of the generator 
polynomial. Thus, the coefficients of the generator polynomial for the (15, 5) code are 
2467, which in binary form is 101001 101 11. Consequently, the generator polynomial 
is g(X) = X 10 + X 8 + X 5 + X 4 + X 2 + X + 1. A more extensive list of generator 
polynomials for BCH codes is given by Peterson and Weldon (1972), who tabulated 
the polynomial factors of X 2 " 1 1 + 1 for m < 34. 

Let us consider from Table 7.10-1 the sequence of BCH codes with triplet param- 
eters ( n , k, t) such that for these codes R c is close to j. These codes include (7, 4, 1), 
(15, 8, 2), (31, 16, 3), (63, 30, 6), (127, 64, 10), and (255, 131, 18) codes. We observe 
that as n increases and the rate remains almost constant, the ratio ' , that is the fraction 
of errors that the code can correct, decreases. In fact for all BCH codes with constant 
rate, as the block length increases, the fraction of correctable errors goes to zero. This 
shows that the BCH codes are asymptotically bad, and for large n their 8 n falls below 
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TABLE 7.10-1 

Coefficients of Generator Polynomials (in Octal Form) for BCH Codes of Length 7 < n < 255 


n 


1 

15 


31 


63 


127 


255 


k 

t 

g(X) 

4 

1 

13 

11 

1 

23 

7 

2 

721 

5 

3 

2467 

26 

1 

45 

21 

2 

3551 

16 

3 

107657 

11 

5 

5423325 

6 

7 

313365047 

57 

1 

103 

51 

2 

12471 

45 

3 

1701317 

39 

4 

166623567 

36 

5 

1033500423 

30 

6 

157464165547 

24 

7 

17323260404441 

18 

10 

1363026512351725 

16 

11 

6331141367235453 

10 

13 

472622305527250155 

7 

15 

5231045543503271737 

120 

1 

211 

113 

2 

41567 

106 

3 

11554743 

99 

4 

3447023271 

92 

5 

624730022327 

85 

6 

130704476322273 

78 

7 

26230002166130115 

71 

9 

6255010713253127753 

64 

10 

1 206534025570773 1 00045 

57 

11 

3352652520570505351772 1 

50 

13 

544465 12523314012421501421 

43 

14 

17721 7722 13651227521 220574343 

36 

15 

3 146074666522075044764574721735 

29 

21 

403114461 367670603667530 141176155 

22 

23 

123376070404722522435445626637647043 

15 

27 

22057042445604554770523013762217604353 

8 

31 

7047264052751030651476224271567733130217 

247 

1 

435 

239 

2 

267543 

231 

3 

156720665 

223 

4 

75626641375 

215 

5 

23157564726421 

207 

6 

16176560567636227 

199 

7 

7633031270420722341 

191 

8 

2663470176115333714567 

187 

9 

5275531 354000132223635 1 

179 

10 

226247 1 07 1 73404324 1 6300455 

171 

11 

1541621421 234235607706 163067 


(continued) 
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TABLE 7.10-1 
( Continued ) 


n k t g(X) 


163 12 7500415510075602551574724514601 

155 13 3757513005407665015722506464677633 

147 14 1642130173537165525304165305441011711 

139 15 461401732060175561570722730247453567445 

131 18 215713331471510151261250277442142024165471 

123 19 120614052242066003717210326516141226272506267 

115 21 60526665572100247263636404600276352556313472737 

107 22 22205772322066256312417300235347420176574750154441 

99 23 10656667253473174222741416201574332252411076432303431 

91 25 6750265030327444172723631724732511075550762720724344561 

87 26 110136763414743236435231634307172046206722545273311721317 

79 27 66700035637657500020270344207366174621015326711766541342355 

71 29 24024710520644321515554172112331163205444250362557643221706035 

63 30 1 0754475055 1 63544325315217357707003666 1 1 172645526761365670254330 1 

55 31 7315425203501100133015275306032054325414326755010557044426035473617 

47 42 2533542017062646563033041377406233175123334145446045005066024552543173 

45 43 15202056055234 161131 101346376423701563670024470762373033202 1 5702505 1 54 1 

37 45 5136330255067007414177447447245437530420735706174323432347644354737403044003 

29 47 3025715536673071465527064012361377115342242324201 1741 14060254757410403565037 

21 55 1256215257060332656001773153607612103227341405653074542521153121614466513473725 

13 59 46417320050525645444265737142500660043306774454765614031746772 1 357026 1 34460500547 

9 63 1 572602521747246320103 1 043255355 1 346141623672 1 2044074545 1 1 2766 1 1554770556 1 6775 1 6057 


the Varshamov-Gilbert bound. We need, however, to keep in mind that this happens at 
large values of n and for small to moderate values of n, which include the most practical 
cases, these codes remain among the best-known codes for which efficient decoding 
algorithms are known. 


7.10-2 Decoding BCH Codes 

Since BCH codes are cyclic codes, any decoding algorithm for cyclic codes can be 
applied to BCH codes. For instance, BCH codes can be decoded using a Meggit decoder. 
However, the additional structure in BCH codes makes it possible to use more efficient 
decoding algorithms, particularly when using codes with long block lengths. 

Let us assume that a codeword c is associated with codeword polynomial c(X). By 
Equation 7.10-4, we know that c(a‘ ) = 0 for 1 < i < 2 1. Let us assume that the error 
polynomial is e(X) and the received polynomial is y(X). Then 

y(X) = c(X) + e(X) (7.10-7) 

Let us denote the value of y(X) at a' by .S', , i.e., the syndromes defined by 

Si = y(a l ) 

= c(a') + e(a') 1 < i < 2f 

= £>(«') 


(7.10-8) 
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Obviously if e(X) is zero, or it is equal to a nonzero codeword, the syndromes are 
all zero. The syndrome can be computed from the received sequence y using GF(2'") 
arithmetic. 

Now let us assume there have been v errors in transmission of c, where v < t. Let 
us denote the location of these errors by j\ , 72 , . . . , j v , where without loss of generality 
we may assume 0 < j\ < j 2 < • • • < j v < n — 1. Therefore 

e(X ) = X h + X j -' -\ hl j2 + X h (7.10-9) 

From Equations 7.10-8 and 7.10-9 we conclude that 

51 = O' 71 + a 72 + • • • + O' 7 " 

5 2 = (a 71 ) 2 + (a 72 ) 2 + • • • + (a 7 ") 2 

(7.10-10) 

S 2 , = (a 71 ) 2 ' + (a j2 f + • • • + (a 7 ”) 2 ' 

These are a set of 2 1 equations in v unknowns, namely, 7 1 ,72 , ... , j v , or equivalently 
o' 7, , 1 < i < v. Any method for solving simultaneous equations can be applied to 
find unknowns a 7 ' from which error locations 71, 72 , . . . , j v can be found. Having 
determined error locations, we change the received bit at those locations to find the 
transmitted codeword c. 

By defining error location numbers ft; = a 7, for 1 < i < v, Equation 7.10-10 
becomes 


Si — fti + fti + ■ ■ ■ + ftv 

s 2 = ftx + ftl + - • + fti 

(7.10-11) 

Sit = ft? + ft? + ■■■ + ft? 

Solving this set of equations determines ft; for 1 < i < v from which error locations can 
be determined. Obviously the ft; ’s are members of GF(2 m ), and solving these equations 
requires arithmetic over GF(2'"). This set of equations in general has many solutions. 
For maximum-likelihood (minimum Hamming distance) decoding we are interested in 
a solution with the smallest number of ft’s. 

To solve these equations, we introduce the error locator polynomial as 


<r(X) = (1 + ftiX)(l + ft 2 X) • • • (1 + ft v X) 

= CT v X l + CTy-lX 1 + ■ ■ ■ + 0\X + (To 


(7.10-12) 


whose roots are ft; 1 for 1 < i < v. Finding the roots of this polynomial determines 
the location of errors. We need to determine cr, for 0 < i < v to have n(X) from which 
we can find the roots and hence locate the errors. Expanding Equation 7.10-12 results 
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in the following set of equations: 
t7 0 = 1 

CT 1 = + f$2 + • ■ • + Pv 

+ Pifo + • • • + Pv-\Pv (7.10-13) 


(Tv = PlPl ' ' ' Pv 

Using Equations 7.10-10 and 7.10-13, we obtain the following set of equations relating 
the coefficients of cr(X ) and the syndromes. 

Si + cf\ =0 
S 2 + 0i 5i + 2 G 2 = 0 

Si + <7 1 S 2 + (72 S] + 3 ct 3 = 0 


: (7.10-14) 

S v + (TiiSy-i + ■ ■ ■ + ffy-iSi + va v = 0 
5 U+ 1 + CTllSy + ' ' ' + (7 y — J (S' 2 + (Ty S \ = 0 


We need to obtain the lowest-degree polynomial cr(X) whose coefficients satisfy this 
set of equations. After determining cr(X), we have to find its roots f3[ . The inverse 
of the roots provides the location of the errors. Note that when the polynomial of the 
lowest degree cr(X) is found, we can simply find its roots over GF(2 m ) by substituting 
the 2'" field elements in the polynomial. 

The Berlekamp-Massey Decoding Algorithm for BCH Codes 

Several algorithms have been proposed for solution of Equation 7.10-14. Here we 
present the well-known Berlekamp-Massey algorithm due to Berlekamp (1968) and 
Massey (1969). Our presentation of this algorithm follows the presentation in Lin and 
Costello (2004). The interested reader is referred to Lin and Costello (2004), Berlekamp 
(1968), Peterson and Weldon (1972), MacWilliams and Sloane (1977), Blahut (2003), 
or Wicker (1995) for details and proofs. 

To implement the Berlekamp-Massey algorithm, we begin by finding a polynomial 
of lowest degree a (>l (X) that satisfies the first equation in 7.10-14. In the second step 
we test to see if a (l \X) satisfies the second equation in 7.10-14. If it satisfies the 
second equation, we set a {2 \X) = rf n, (X). Otherwise, we introduce a correction term 
to <7 n> (X) to obtain <t ( 2) (X). the polynomial of the lowest degree that satisfies the first 
two equations. This process is continued until we obtain a polynomial of minimum 
degree that satisfies all equations. 

In general, if 

a (/i) (X) = o\^X l » + o^X^ + • • • + a^X 2 + a^X + 1 


(7.10-15) 
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is the polynomial of the lowest degree that satisfies the first [i equations in 
Equation 7.10-14, to find a (ll+i, (X) we compute the /i.tli discrepancy, denoted by 
d jt and given by 

d„ = S „ + 1 + tr^Sn + cr^S^ + ■ ■ ■ + (7.10-16) 

If dfj, = 0, no correction is necessary and the a <ll> (X) that satisfies the (// + l)st equation 
is Equation 7.10-14. In this case we set 

cj (p+]) (X) = cr (/i) (Z) (7.10-17) 

If dj, 0, a correction is necessary. In this case ct (m+ 1, (X) is given by 

cr (/i+1) (X) = a (p -\X) + d ll d- 1 a ( ' p \X)X tl - p (7.10-18) 


where p < p is selected such that d f) 7 ^ 0 and among all such p’s the value of p — I p 
is maximum ( l p is the degree of a <r,> (X)). 

The polynomial given by Equation 7.10-18 is the polynomial of the lowest degree 
that satisfies the first (p + 1) equations in Equation 7.10-14. This process is continued 
until (t (2i) (X) is derived. The degree of this polynomial determines the number of errors, 
and its roots can be used to locate the errors, as explained earlier. If the degree of (t (2i HX) 
is higher than t, the number of errors in the received sequence is greater than t, and the 
errors cannot be corrected. 

The Berlekamp-Massey algorithm can be better carried out if we begin with a table 
such as Table 7.10-2. 

example 7 . 10 - 4 . Let us assume that the double-error-correcting BCH code designed 
in Example 7.10-3 is considered, and the binary received sequence at the output of the 
BSC channel is 


y = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1) 


TABLE 7.10-2 

The Berlekamp-Massey Algorithm 



aW(X) 

dp. 

Ip 


-1 

1 

1 

0 

-1 

0 

1 

Si 

0 

0 

1 

1 + S, X 




2 





2 1 






Chapter Seven: Linear Block Codes 


471 


■ TABLE 7.10-3 

The Berlekamp-Massey Algorithm 
Implementation for Example 7.10-4 



ffW(A) 

d. 

Ifl 

IL — lfi 

-1 

1 

1 

0 

-1 

0 

1 

a H 

0 

0 

1 

1 + a u X 

0 

1 

0 

2 

1 +a 14 X 

a 2 

1 

1 

3 

1 +a u X+a’ i X 1 

0 

2 

1 

4 

1 +a u X + a' i X 1 


2 

2 


The corresponding received polynomial is y(X) = X 3 + 1, and the syndrome compu- 
tation yields 


Si = a 3 + 1 = a 14 
$2 = a 6 + 1 = a 13 

53 = a 9 + 1 = a 1 

5 4 = a 12 + 1 = a 11 


(7.10-19) 


where we have used Table 7.1-6. Now we have all we need to fill in the entries 
of Table 7.10-2 by using Equations 7.10-16 to 7.10-18. The result is given in 
Table 7.10-3. 

Therefore <r(X) = 1 + « 14 A + cr 3 A 2 , and since the degree of this polynomial 
is 2, this corresponds to a correctable error pattern. We can find the roots of cr(A) by 
inspection, i.e., by substituting the elements of GF(2 4 ). This will give the two roots 
of 1 and a 12 . Since the roots are the reciprocals of the error location numbers, we 
conclude that the error location numbers are ySi = a 0 and /T = a 3 . From this the 
errors are at locations j\ = 0 and j 2 = 3. From Equation 7.10-9 the error polynomial 
is e(X) = 1 + A 3 , and c(A) = y(X) + e(X) = 0, i.e., the detected codeword, is the 
all-zero codeword. 


7.11 

REED-SOLOMON CODES 

Reed-Solomon (RS) codes are probably the most widely used codes in practice. These 
codes are used in communication systems and particularly data storage systems. Reed- 
Solomon codes are a special class of nonbinary BCH codes that were first introduced in 
Reed and Solomon (1960). As we have already seen, these codes achieve the Singleton 
bound and hence belong to the class of MDS codes. 

Recall that in construction of a binary BCH code of block length n = 2'" — 1, 
we began by selecting a primitive element in GF(2'") and then finding the minimal 
polynomials of a' for 1 < i < 2 1. The notion of the minimal polynomial as defined 
in Section 7.1-1 was a special case of the general notion of minimal polynomial with 
respect to a sublield. We defined the minimal of f J > e GF(2'") as a polynomial of lowest 
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degree over GF(2), where one of its roots is p. This is the definition of the minimal 
polynomial with respect to GF(2). If we drop the restriction that the minimal polynomial 
be defined over GF(2), we can have other minimal polynomials of lower degree. One 
extreme case occurs when we define the minimal polynomial of p e GF(2'") with 
respect to GF(2 m ). In this case we look for a polynomial of lowest degree over GF(2'”) 
whose root is p. Obviously X + p is such a polynomial. 

Reed-Solomon codes are /-error-correcting 2'”-ary BCH codes with block length 
N = 2 m — 1 symbols (i.e., mN binary digits)^ . To design a Reed-Solomon code, we 
choose a € GF(2'") to be a primitive element and find the minimal polynomials of a 1 , 
for 1 < i < 2 1, over GF(2 m ). These polynomials are obviously of the form X + a' . 
Hence, the generator polynomial g(X ) is given by 

g(X) = (X + a)(X + a 2 )(X + a 3 ) • • • (X + a 2t ) 

2 , 2 . , (7-11-1) 

— X~ + g2t—\X~~ + • • • + g \X + go 

where g,- e GF(2" ! ) for 0 < i < 2t — 1; i.e., g(X) is a polynomial over GF(2'”). Since 
a‘, for 1 < i < 2 1, are nonzero elements of GF(2 m ), they are all roots of X 2 ’ 1 + 1; 

therefore g(X) is a divisor of X 2 ”' ! + 1, and it is the generator polynomial of a 2 m -ary 
code with block length N = 2 m — 1 and N — K =2 1. Note that the weight of g(X) 
cannot be less than D m ; n , the minimum distance of the code, which is, by Equation 7. 10- 
1, at least 2t + 1. This means that none of the gP s in Equation 7.1 1-1 can be zero, and 
therefore the minimum weight of the resulting code is equal to 2t + 1 . Therefore, for 
this code 


D mm = 2t+l=N-K + l (7.1 1-2) 

which shows that the code is MDS. 

From the discussion above, we conclude that Reed-Solomon codes are 2 m -ary 
(2 m — 1,2'” — 2t — 1) BCH codes with minimum distance D m ; n = 2 1 + 1, where m is 
any positive integer greater than or equal to 3 and 1 < t < 2 m_1 — 1. Equivalently, we 
can define Reed-Solomon codes in terms of m and /J mm , the minimum distance of the 
code, as 2 m -ary BCH codes with N = 2"' — 1 and K = N — D min , where 3 < D m i n < n. 

example 7 . 11 - 1 . To design a triple-error-correcting Reed-Solomon code of length 
n — 15, we note that N — 15 = 2 4 — 1. Therefore, m — 4 and t — 3. We choose 
a e GF(2 4 ) to be a primitive element. Using Equation 7.1 1-1, we obtain 

g(X) = (X + a)(X + a 2 )(X + a 3 )(X + a 4 )(X + a s )(X + a 6 ) 

= X 6 + a 10 X 5 + a I4 X 4 + a 4 X 3 + a 6 X 2 + a 9 X + a 6 

This is a (15, 8) triple-error-correcting Reed-Solomon code over GF(2 4 ). Codewords 
of this code have a block length of 15 where each component is a 2 4 -ary symbol. In 
binary representation the codewords have length 60. 

A popular Reed-Solomon code is the (255, 223) code over GF(2 8 ). This code has a 
minimum distance of D m ; n = 255—223+1 = 33 and is capable of correcting 16 symbol 
errors. If these errors are spread, in the worst possible scenario this code is capable of 


tin general, RS codes are defined on GF(p m ). For Reed-Solomon codes we denote the block length by N 
(symbols) and the number of information symbols by K. The minimum distance is denoted by D m 
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correcting 16 bit errors. On the other hand, if these errors occur as a cluster, i.e., if we 
have a burst of errors, this code can correct any burst of length 14 x 8 + 2 = 114 bits. 
Some bursts of length up to 16 x 8 = 128 errors can be corrected also by this code. 
That is the reason why Reed-Solomon codes are particularly attractive in channels with 
burst of eiTors. Such channels include fading channels and storage channels in which 
scratches and manufacturing imperfections usually damage a sequence of bits. Reed- 
Solomon codes are also popular in concatenated coding schemes discussed later in this 
chapter. 

Since Reed-Solomon codes are BCH codes, any algorithm used for decoding BCH 
codes can be used for decoding Reed-Solomon codes. The Berlekamp-Massey algo- 
rithm, for instance, can be used for the decoding of Reed-Solomon codes. The only 
difference is that after locating the errors, we also have to determine the values of the 
errors. This step was not necessary in binary BCH codes since in that case the value 
of any error is 1 that changes a 0 to a 1 and a 1 to a 0. In nonbinary BCH codes that is 
not the case. The value of error can be any nonzero member of GF(2 m ) and has to be 
determined. The methods used to determine the value of errors are beyond the scope 
of our treatment. The interested user is referred to Lin and Costello (2004). 

An interesting property of Reed-Solomon codes is that their weight enumeration 
polynomial is known. In general, the weight distribution of a Reed-Solomon code with 
symbols from GF(c/) and with block length N = q — 1 and minimum distance /J mm is 
given by 


A nonbinary code is particularly matched to an M - ary modulation technique for 
transmitting the 2'" possible symbols. Specifically, M - ary orthogonal signaling, e.g., 
M - ary FSK, is frequently used. Each of the 2'" symbols in the 2 m -ary alphabet is mapped 
to one of the M = 2 m orthogonal signals. Thus, the transmission of a codeword is 
accomplished by transmitting N orthogonal signals, where each signal is selected from 
the set of M = 2 m possible signals. 

The optimum demodulator for such a signal corrupted by AWGN consists of M 
matched filters (or cross-correlators) whose outputs are passed to the decoder, either 
in the form of soft decisions or in the form of hard decisions. If hard decisions are 
made by the demodulator, the symbol error probability Pm and the code parameters 
are sufficient to characterize the performance of the decoder. In fact, the modulator, 
the AWGN channel, and the demodulator form an equivalent discrete (M -ary) input, 
discrete (M- ary) output, symmetric memoryless channel characterized by the transition 
probabilities P, = 1 — Pm and Pm/(M — 1). This channel model, which is illustrated 
in Figure 7.1 1-1, is a generalization of the BSC. 

The performance of the hard decision decoder may be characterized by the follow- 
ing upper bound on the codeword error probability: 





(7.11-5) 


where t is the number of errors guaranteed to be corrected by the code. 


474 


Digital Communications 



FIGURE 7.11-1 

An M - ary input, M - ary output, symmetric 
memoryless channel. 


When a codeword error is made, the corresponding symbol error probability is 

Pes = ^ Pm) N ~‘ (V. 1 1-6) 

Furthermore, if the symbols are converted to binary digits, the bit error probability 
corresponding to Equation 7.11-6 is 

2JH — \ 

P eb = 2 m _ | P <- d 7 - 11 ” 7 ) 


example 7.11-2. Let us evaluate the performance of an N — 2 5 — 1 = 31 Reed- 
Solomon code with 7) lnm = 3,5,9, and 17. The corresponding values of K are 29, 27, 
23, and 15. The modulation is M — 32 orthogonal FSK with noncoherent detection at 
the receiver. The probability of a symbol error is given by Equation 4.5-44 and may 
be expressed as 


1 


M 


i= 2 


,Y/i 


(7.11-8) 


where y is the SNR per code symbol. By using Equation 7.11-8 in Equation 7.11-6 
and combining the result with Equation 7. 1 1-7, we obtain the bit error probability. The 
results of these computations are plotted in Figure 7. 1 1-2. Note that the more powerful 
codes (large D m i n ) give poorer performance at low SNR per bit than the weaker codes. 
On the other hand, at high SNR, the more powerful codes give better performance. 
Hence, there are crossovers among the various codes, as illustrated, for example, in 
Figure 7. 1 1-2 for the t = 1 and t — 8 codes. Crossovers also occur among the t — 1,2, 
and 4 codes at smaller values of SNR per bit. Similarly, the curves for t — 4 and 8 and 
for t = 8 and 2 cross in the region of high SNR. This is the characteristic behavior for 
noncoherent detection of the coded waveforms. 


If the demodulator does not make a hard decision on each symbol, but instead 
passes the unquantized matched filter outputs to the decoder, soft decision decoding 
can be performed. This decoding involves the formation of q K = 2' nK correlation 
metrics, where each metric corresponds to one of the q K codewords and consists of a 
sum of N matched filter outputs corresponding to the N code symbols. The matched 
filter outputs may be added coherently, or they may be envelope-detected and then 
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FIGURE 7.11-2 

Performance of several IV = 31, t-error-correcting 
Reed-Solomon codes with 32-ary FSK modulation on an AWGN 
channel (noncoherent demodulation) 


added, or they may be square-law-detected and then added. If coherent detection is 
used and the channel noise is AWGN, the computation of the probability of error is a 
straightforward extension of the binary case considered in Section 7.4. On the other 
hand, when envelope detection or square-law detection and noncoherent combining 
are used to form the decision variables, the computation of the decoder performance is 
considerably more complicated. 


■ 7.12 

CODING FOR CHANNELS WITH BURST ERRORS 

Most of the well-known codes that have been devised for increasing reliability in the 
transmission of information are effective when the errors caused by the channel are 
statistically independent. This is the case for the AWGN channel. However, there are 
channels that exhibit bursty error characteristics. One example is the class of channels 
characterized by multipath and fading, which is described in detail in Chapter 13. Signal 
fading due to time- variant multipath propagation often causes the signal to fall below 
the noise level, thus resulting in a large number of errors. A second example is the class 
of magnetic recording channels (tape or disk) in which defects in the recording media 
result in clusters of errors. Such error clusters are not usually corrected by codes that 
are optimally designed for statistically independent errors. 

Some of the codes designed for random error correction, i.e., nonburst errors, have 
the capability of burst error correction. A notable example is Reed-Solomon codes that 
can easily correct long burst of errors because such long error bursts result in a few 
symbol errors that can be easily corrected. Considerable work has been done on the 
construction of codes that are capable of correcting burst errors. Probably the best- 
known burst error correcting codes are the subclass of cyclic codes called Fire codes, 
named after P. Fire (Fire (1959)), who discovered them. Another class of cyclic codes 
for burst error correction was subsequently discovered by Burton (1969). 

A burst of errors of length b is defined as a sequence of 6-hit errors, the first and 
last of which are 1. The burst error correction capability of a code is defined as 1 less 
than the length of the shortest uncorrectable burst. It is relatively easy to show that a 
systematic (n, k) code, which has n — k parity check bits, can correct bursts of length 
b < L \in - k) J. 
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FIGURE 7.12-1 

Block diagram of system employing interleaving for burst error channel. 


An effective method for dealing with burst error channels is to interleave the coded 
data in such a way that the bursty channel is transformed to a channel having independent 
errors. Thus, a code designed for independent channel errors (short bursts) is used. 

A block diagram of a system that employs interleaving is shown in Figure 7.12-1. 
The encoded data are reordered by the interleaver and transmitted over the channel. At 
the receiver, after either hard or soft decision demodulation, the deinterleaver puts the 
data in proper sequence and passes them to the decoder. As a result of the interleav- 
ing/deinterleaving, error bursts are spread out in time so that errors within a codeword 
appear to be independent. 

The interleaver can take one of two forms : a block structure or a convolutional struc- 
ture. A block interleaver formats the encoded data in a rectangular array of m rows and 
n columns. Usually, each row of the array constitutes a codeword of length n. An inter- 
leaver of degree m consists of m rows (in codewords) as illustrated in Figure 7.12-2. 
The bits are read out columnwise and transmitted over the channel. At the receiver, the 
deinterleaver stores the data in the same rectangular array format, but they are read out 
rowwise, one codeword at a time. As a result of this reordering of the data during trans- 
mission, a burst of errors of length 1 = mb is broken up into m bursts of length b. Thus, 
an (n, k ) code that can handle burst errors of length b < \_\{n — k) J can be combined 
with an interleaver of degree m to create an interleaved (mn, mk ) block code that can 
handle bursts of length mb. A convolutional interleaver can be used in place of a block 
interleaver in much the same way. Convolutional interleavers are better matched for 
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FIGURE 7.12-2 

A block interleaver for coded data. 
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use with the class of convolutional codes that is described in Chapter 8. Convolutional 
interleaver structures have been described by Ramsey (1970) and Forney (1971). 

■ 7.13 

COMBINING CODES 

The performance of a block code depends mainly on the number of errors it can cor- 
rect, which is a function of the minimum distance of the code. For a given rate R c , one 
can design codes with different block lengths. Codes with higher block length offer 
the possibility of higher minimum distances and thus higher error correction capabil- 
ity. This is clearly seen from the different bounds on the minimum distance derived 
in Section 7.7. The problem, however, is that the decoding complexity of a block 
code generally increases with the block length, and this dependence in general is an 
exponential dependence. Therefore improved performance through using block codes 
is achieved at the cost of increased decoding complexity. 

One approach to design block codes with long block lengths and with manageable 
complexity is to begin with two or more simple codes with short block lengths and 
combine them in a certain way to obtain codes with longer block length that have 
better distance properties. Then some kind of suboptimal decoding can be applied to 
the combined code based on the decoding algorithms of the simple constituent codes. 


7.13-1 Product Codes 


A simple method of combining two or more codes is described in this section. The 
resulting codes are called product codes , first studied by Elias (1954). Let us assume 
we have two systematic linear block codes; code C, is an (n l , k , ) code with minimum 
distance d mml for i = 1,2. The product of these codes is an (« j/ii, k \ A: 2 ) linear block 
code whose bits are arranged in a matrix form as shown in Figure 7.13-1. 

The k\k 2 information bits are put in a rectangle with width k\ and height ki. The k\ 
bits in each row of this matrix are encoded using the encoder for code C \ , and the /q bits 
in each column are encoded using the encoder for code C 2 . The (n 1 — k \ ) x (n 2 — kn) bits 

FIGURE 7.13-1 

The structure of a product code. 
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in the lower right rectangle can be obtained either from encoding the bottom no — ko 
rows using the encoding rule for C\ or from encoding the rightmost n\ — k\ columns 
using the encoding rule for Co_. It is shown in Problem 7.63 that the results of these two 
approaches are the same. 

The resulting code is an k\ko) systematic linear block code. The rate of the 

product code is obviously the product of the rates of its component codes. Moreover, 
it can be shown that the minimum distance of the product code is the product of the 
minimum distances of the component codes, i.e., d mm = d mm i £'/ mm 2 (see Problem 7.64), 
and hence the product code is capable of correcting 

^ dm in ! d m j n 2 1 t 1 . 


emors using a complex optimal decoding scheme. 

We can design a simpler decoding scheme based on the decoding rules of the two 
constituent codes as follows. Let us assume 


ti = 


dm in i 1 


i = 1,2 


(7.13-2) 


is the number of errors that code C\ can correct. Now let us assume in transmission of 
the ii \iio binary digits of a codeword that fewer than (q + l)fe + l) errors have occurred. 
Regardless of the location of errors, the number of rows of the product code shown in 
Figure 7.13-1 that have more than t\ errors is less than or equal to h. because otherwise 
the total number of errors would be ( t\ + 1 )(to + 1) or higher. Since each row having less 
than t\ + 1 errors can be fully recovered using the decoding algorithm of Ci, if we do 
rowwise decoding, we will have at most to rows decoded erroneously. This means that 
after this stage of decoding the number of errors in each column cannot exceed ? 2 , all 
of which can be corrected using the decoding algorithm for Co on columns. Therefore, 
using this simple two-stage decoding algorithm, we can correct up to 


r = (fi + 1) to + 1) - 1 

= t\ t2 + t\ + ?2 


(7.13-3) 


errors. 


example 7 . 13 - 1 . Consider a (255, 123) BCH code with d m i nl = 39 and t\ = 19 and 
a (15, 7) BCH code with d rnm 2 = 5 and to = 2 (see Example 7.10-3). The product of 
these codes has a minimum distance of 39 x 5 = 195 and can correct up to 97 errors if a 
complex decoding algorithm is employed to take advantage of the full error-correcting 
capability of the code. A two-stage decoding algorithm can, however, correct up to 
(19 + 1)(2 + 1) — 1 = 59 errors at noticeably lower complexity. 

Another decoding algorithm, similar to how a crossword puzzle is solved, can also 
be used for decoding product codes. Using the row codes, we can come up with the best 
guess for the bit values; and then using the column codes, we can improve these guesses. 
This process can be repeated in an iterative fashion, improving the quality of the guess 
in each step. This process is known as iterative decoding and is very similar to the way 
a crossword puzzle is solved. To employ this decoding procedure, we need decoding 
schemes for the row and column codes that are capable of providing guesses about 
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each individual bit. In other words, decoding schemes with soft outputs — usually, the 
likelihood values — are desirable. We will describe such decoding procedures in our 
discussion of turbo codes in Chapter 8. 


7.13-2 Concatenated Codes 

In concatenated coding two codes, one binary and one nonbinary are concatenated such 
that the codewords of the binary code are treated as symbols of the nonbinary code. 
The combination of the binary channel and the binary encoder and decoder appears 
as a nonbinary channel to the nonbinary encoder and decoder. The binary code that is 
directly connected to the binary channel is called the inner code , and the nonbinary 
code that operates on the combination of binary encoder/binary channel/binary decoder 
is called the outer code. 

To be more specific, let us consider the concatenated coding scheme shown in 
Figure 7.13-2. The nonbinary (N, K) code forms the outer code, and the binary code 
forms the inner code. Codewords are formed by subdividing a block of kK information 
bits into K groups, called symbols, where each symbol consists of k bits. The K k - bit 
symbols are encoded into N £-bit symbols by the outer encoder, as is usually done 
with a nonbinary code. The inner encoder takes each k - bit symbol and encodes it into 
a binary block code of length n. Thus we obtain a concatenated block code having a 
block length of Nn bits and containing kK information bits. That is, we have created 
an equivalent (Nn, Kk) long binary code. The bits in each codeword are transmitted 
over the channel by means of PSK or, perhaps, by FSK. 

We also indicate that the minimum distance of the concatenated code is d m \ n Dmm , 
where /J mm is the minimum distance of the outer code and d mm is the minimum distance 
of the inner code. Furthermore, the rate of the concatenated code is Kk/Nn, which is 
equal to the product of the two code rates. 

A hard decision decoder for a concatenated code is conveniently separated into an 
inner decoder and an outer decoder. The inner decoder takes the hard decisions on each 
group of n bits, corresponding to a codeword of the inner code, and makes a decision on 
the k information bits based on maximum-likelihood (minimum-distance) decoding. 
These k bits represent one symbol of the outer code. When a block of N k - bit symbols 
is received from the inner decoder, the outer decoder makes a hard decision on the 
K /c -hit symbols based on maximum-likelihood decoding. 



data 


FIGURE 7.13-2 

A concatenated coding scheme. 
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Soft decision decoding is also a possible alternative with a concatenated code. 
Usually, the soft decision decoding is performed on the inner code, if it is selected to 
have relatively few codewords, i.e., if 2 k is not too large. The outer code is usually 
decoded by means of hard decision decoding, especially if the block length is long 
and there are many codewords. On the other hand, there may be a significant gain in 
performance when soft decision decoding is used on both the outer and inner codes, to 
justify the additional decoding complexity. This is the case in digital communications 
over fading channels, as we shall demonstrate in Chapter 14. 

example 7 . 13 - 2 . Suppose that the (7, 4) Hamming code is used as the inner code in 
a concatenated code in which the outer code is a Reed-Solomon code. Since k = 4, we 
select the length of the Reed-Solomon code to be N = 2 4 — 1 = 15. The number of 
information symbols K per outer codeword may be selected over the range 1 < K < 14 
in order to achieve a desired code rate. 

Concatenated codes with Reed-Solomon codes as the outer code and binary con- 
volutional codes as the inner code have been widely used in the design of deep space 
communication systems. More details on concatenated codes can be found in the book 
by Forney (1966a). 

Serial and Parallel Concatenation with Interleavers 

An interleaver may be used in conjunction with a concatenated code to construct a 
code with extremely long codewords. In a serially concatenated block code (SCBC), 
the interleaver is inserted between the two encoders as shown in Figure 7.13-3. Both 
codes are linear systematic binary codes. The outer code is a (p, k) code, and the inner 
code is an ( n , p) code. The block interleaver length is selected as N = mp, where m is 
a usually large positive integer that determines the overall block length. The encoding 
and interleaving are performed as follows: mk information bits are encoded by the 
outer encoder to produce mp coded bits. These N = mp coded bits are read out of the 
interleaver in different order according to the permutation algorithm of the interleaver. 
The mp bits at the output of the interleaver are fed to the inner encoder in blocks of 
length p. Therefore, a block of mk information bits is encoded by the SCBC into a 
block of tnn bits. The resulting code rate is R s c = k/n, which is the product of the code 
rates of the inner and outer encoders. However, the block length of the SCBC is nm 
bits, which can be significantly larger than the block length of the conventional serial 
concatenation of the block codes without the use of the interleaver. 

The block interleaver is usually implemented as a pseudorandom interleaver, i.e., 
an interleaver that pseudorandomly permutes the block of N bits. For purposes of 
analyzing the performance of SCBC, such an interleaver may be modeled as a uniform 
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FIGURE 7.13-3 

Serial concatenated block code with interleaver. 
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mk information bits 


m{n x -k ) parity check bits 


m(n 2 - k) parity check bits 


FIGURE 7.13-4 

Parallel concatenated block code (PCBC) with interleaver. 


interleaver, which is defined as a device that maps a given input word of weight w 
into all distinct (^) permutations with equal probability. This operation is similar to 
Shannon’s random coding argument, where here the average performance is measured 
over all possible interleavers of length N. 

By use of interleaving, parallel concatenated block codes (PCBCs) can be con- 
structed in a similar manner. Figure 7.13-4 illustrates the basic configuration of such 
an encoder based on two constituent binary codes. The constituent codes may be iden- 
tical or different. The two encoders are systematic, binary linear encoders, denoted as 
(n\, k ) and (/? 2 , k). The pseudorandom block interleaver has length N = k, and thus 
the overall PCBC has block length n 1 + 112 — k and rate k/(ri\ + ni — k), since the 
information bits are transmitted only once. More generally, we may encode mk bits 
(m > 1) and thus use an interleaver of length N = mk. The design of interleavers 
for parallel concatenated codes is considered in a paper by Daneshgaran and Mondin 
(1999). 

The use of an interleaver in the construction of SCBC and PCBC results in code- 
words that are both large in block length and relatively sparse. Decoding of these types 
of codes is generally performed iteratively, using soft-in/soft-out (SISO) maximum a 
posteriori probability (MAP) algorithms. An iterative MAP decoding algorithm for 
serially concatenated codes is described in the paper by Benedetto et al. (1998). Iter- 
ative MAP decoding algorithms for parallel concatenated codes have been described 
in a number of papers, including Berrou et al. (1993), Benedetto and Montorsi (1996), 
Hagenauer et al. (1996) and in the book by Heegard and Wicker (1999). The combi- 
nation of code concatenation with interleaving and iterative MAP decoding results in 
performance very close to the Shannon limit at moderate error rates, such as 10 4 to 
10 5 (low SNR region). More details on this type of concatenation will be given in 
Chapter 8. 
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The pioneering work on coding and coded waveforms for digital communications was 
done by Shannon (1948), Hamming (1950), and Golay (1949). These works were 
rapidly followed with papers on code performance by Gilbert (1952), new codes by 
Muller (1954) and Reed (1954), and coding techniques for noisy channels by Elias 
(1954, 1955) and Slepian (1956). During the period 1960-1970, there were a num- 
ber of significant contributions in the development of coding theory and decoding 
algorithms. In particular, we cite the papers by Reed and Solomon (1960) on Reed- 
Solomon codes, the papers by Hocquenghem (1959) and Bose and Ray-Chaudhuri 
(1960) on BCH codes, and the Ph.D. dissertation of Fomey (1966) on concatenated 
codes. These works were followed by the papers of Goppa (1970, 1971) on the con- 
struction of a new class of linear cyclic codes, now called Goppa codes [see also 
Berlekamp (1973)], and the paper of Justesen (1972) on a constructive technique for 
asymptotically good codes. During this period, work on decoding algorithms was pri- 
marily focused on BCH codes. The first decoding algorithm for binary BCH codes 
was developed by Peterson (1960). A number of refinements and generalizations by 
Chien (1964), Fomey (1965), Massey (1965), and Berlekamp (1968) led to the devel- 
opment of the Berlekamp-Massey algorithm described in detail in Lin and Costello 
(2004) and Wicker (1995). A treatment of Reed-Solomon codes is given in the book by 
Wicker and Bhargava (1994). 

In addition to the references given above on coding, decoding, and coded signal 
design, we should mention the collection of papers published by the IEEE Press entitled 
Key Papers in the Development of Coding Theory, edited by Berlekamp (1974). This 
book contains important papers that were published in the first 25 years of the develop- 
ment of coding theory. We should also cite the Special Issue on Error-Correcting Codes, 
IEEE Transactions on Communications (October 1971). Finally, the survey papers by 
Calderbank(1998), Costello et al. (1998), and Forney and Ungerboeck(1998) highlight 
the major developments in coding and decoding over the past 50 years and include a 
large number of references. Standard textbooks on this subject include those by Lin 
and Costello (2004), Mac Williams and Sloane (1977), Blahut (2003), Wicker (1995), 
and Berlekamp (1968). 


PROBLEMS 

7.1 From the definition of a Galois field GFfg) we know that {F — {0}, 1} is an Abelian 

group with q — 1 elements. 

1. Let a G {F — {0}, •, 1} and define a' = a ■ a ■ a ■■■ a . Show that for some positive j 

i times 

we have a-' = 1 and a 1 ^ 1 for all 0 < i < j , where j is called the order of a. 

2. Show that if 0 < i < i' < j , then a' and a' are distinct elements of [F — {0}, •, 1}. 

3. Show that Q a = { a , a 2 , a 3 , . . . , a- 7 '} is an Abelian group under multiplication; Q a is 
called the cyclic subgroup of element a . 
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4. Let us assume that a b G {F — {0}, •, 1} exists such that b £ Q a . Show that Qt, a = 
{b ■ a, b ■ a 2 , . . . , b ■ a-*} is an Abelian group and Q a fl Qb a = 0. Therefore, if such a b 
exists, the number of elements in [F — {0}, •, 1} is at least 2 j, and Qba is called a coset 
Of Q a . 

5. Use the argument of part 4 to prove that the nonzero elements of GF(q) can be written 
as the union of disjoint cosets, and hence the order of any element of GF(g) divides 

q- L 

6. Conclude that for any nonzero fi G GF(7/) we have /U~ 1 = 1. 

7.2 Use the result of Problem 7. 1 to prove that the q elements of GF(q r ) are the roots of equation 

X q - X = 0 

7.3 Construct the addition and multiplication tables of GF(5). 

7.4 List all prime polynomials of degrees 2 and 3 over GF(3). Using a prime polynomial of 
degree 2, generate the multiplication table of GF(9). 

7.5 List all primitive elements in GF(8). Flow many primitive elements are in GF(32)? 

7.6 Let a G GF(2 4 ) be a primitive element. Show that {0, 1, a 5 , a 10 } is a field. From this 
conclude that GF(4) is a subfield of GF(16). 

7.7 Show that GF(4) is not a subfield of GF(32). 

7.8 Using Table 7.1-5, generate GF(32) and express its elements in polynomials, power, and 
vector form. Find the minimal polynomials of fl = a 3 and y = a 3 + a, where a is a 
primitive element. 

7.9 Let fi G GF (p m ) be a nonzero element. Show that 

I> = ° 

1=1 

and 

m 

5 >/° 

i=l 

for all 0 < m < p. 

7.10 Let a, fi G GF(//" ). Show that 


(a + py =a p + /3 p 

7.11 Show that any binary linear block code of length n has exactly 2 k codewords for some 
integer k < n. 

7.12 Prove that the Hamming distance between two sequences of length n , denoted by d H (x, y), 
satisfies the following properties: 

1. d H (x, y) = 0 if and only if x = y 
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2. d H (x, y) = d H (y, x) 

3. d H (x,z) <d H (x, y) + d H (y,z) 

These properties show that d H is a metric. 
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7.13 The generator matrix for a linear binary code is 


G = 


'0 

0 

1 


0 1 
1 0 
0 0 


1 1 
0 1 
1 1 


0 r 

1 l 

1 o 


a. Express G in systematic [/]/’] form. 

b. Determine the parity check matrix H for the code. 

c. Construct the table of syndromes for the code. 

d. Determine the minimum distance of the code. 

e. Demonstrate that the codeword c corresponding to the information sequence 101 satisfies 
cH' = 0 . 


7.14 A code is self-dual if C = C L . Show that in a self-dual code the block length is always 
even and the rate is | . 


7.15 Consider a linear block code with codewords {0000, 1010, 0101, 1111}. Find the dual of 
this code and show that this code is self-dual. 


7.16 List the codewords generated by the matrices given in Equations 7.9-13 and 7.9-15, and 
thus demonstrate that these matrices generate the same set of codewords. 

7.17 Determine the weight distribution of the (7, 4) Hamming code, and check your result with 
the list of codewords given in Table 7.9-2. 

7.18 Show that for binary orthogonal signaling, for instance, orthogonal BFSK, we have 
A = e~ £ ^ 1N °, w j lere a is defined by Equation 7.2-36. 

7.19 Find the generator and the parity check matrices of a second-order (r = 2) Reed-Muller 
code with block length n = 16. Show that this code is the dual of a first-order Reed-Muller 
code with n = 16. 


7.20 Show that repetition codes whose block length is a power of 2 are Reed-Muller codes of 
order r = 0. 


7.21 When an ( n , k ) Hadamard code is mapped into waveforms by means of binary PSK, the 
corresponding M = 2 k waveforms are orthogonal. Determine the bandwidth expansion 
factor for the M orthogonal waveforms, and compare this with the bandwidth requirements 
of orthogonal FSK detected coherently. 

7.22 Show that the signaling waveforms generated from a maximum-length shift register code 
by mapping each bit in a codeword into a binary PSK signal are equicorrelated with 
correlation coefficient p r = — 1 /(M — 1), i.e., the M waveforms form a simplex set. 

7.23 Using the generator matrix of a (2'" — 1 , m) maximum-length code as defined in 
Section 7.3-3, do the following. 
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a. Show that maximum-length codes are constant-weight codes; i.e., all nonzero 
codewords of a (2 m — 1, m) maximum-length code have weight 2 m ~ l . 

b. Show that the weight distribution function of a maximum-length code is given by 
Equation 7.3^1. 

c. Use the MacWilliams identity to determine the weight distribution function of a 
(2 m — 1, 2'" — 1 —m) Hamming code as the dual to a maximum-length code. 

7.24 Compute the error probability obtained with a (7, 4) Hamming code on an AW GN channel, 
for both hard decision and soft decision decoding. Use Equations 7.4-18, 7.4-19, 7.5-6, 
and 7.5-18. 

7.25 Show that when a binary sequence x of length n is transmitted over a BSC with crossover 
probability p, the probability of receiving y, which is at Hamming distance d from x, is 
given by 


P(y\x) = (i - p) n t 1 — 

\i-p 

From this conclude that if p < \, P(y|jc) is a decreasing function of d and hence 
ML decoding is equivalent to minimum-Hamming-distance decoding. What happens if 

p> 

7.26 Using a symbolic computation program (e.g., Mathematica or Maple), find the weight 
enumeration polynomial for a (15, 11) Hamming code. Plot the probability of decoding 
error (when this code is used for error correction) and undetected error (when the code 
used for error detection) as a function of the channel error probability p in the range 
10“ 6 < p < KT 1 . 

7.27 By using a computer find the number of codewords of weight 34 in a (63, 57) Hamming 
code. 

7.28 Prove that if the sum of two error patterns e\ and ei is a valid codeword Cj . then each error 
pattern has the same syndrome. 

7.29 Prove that any two /^-tuples in the same row of a standard array add to produce a valid 
codeword. 

7.30 Prove that 

1. Elements of the standard array of a linear block code are distinct. 

2. Two elements belonging to two distinct cosets of a standard array have distinct 
syndromes. 

7.31 A (k + 1 , k) block code is generated by adding 1 extra bit to each information sequence of 
length k such that the overall parity of the code (i.e., the number of Is in each codeword) is 
an odd number. Two students, A and B, make the following arguments on error detection 
capability of this code. 

1. Student A: Since the the weight of each codeword is odd, any single error changes the 
weight to an even number. Hence, this code is capable of detecting any single error. 
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2. Student B: The all-zero information sequence 00 • • • 0 will be encoded by adding 

k 

one extra 1 to generate the codeword 00 • • • 0 1 . This means that there is at least one 

k 

codeword of weight 1 in this code. Therefore, d m ; n = 1, and since any code can detect 
at most J m i n — 1 errors, and for this code d m i n —1=0, this code cannot detect any 
errors. 

Which argument do you agree with and why? Give your explanation in one short paragraph. 

7.32 The parity check matrix of a linear block code is given below: 

'1 1 0 1 1 0 0 0' 

10 110 10 0 
“01110010 
1 1 1 0 0 0 0 1 

1. Determine the generator matrix for this code in the systematic form. 

2. How many codewords are in this code? What is the d mm for this code? 

3. What is the coding gain for this code (soft decision decoding and BPSK modulation 
over an AWGN channel are assumed)? 

4. Using hard decision decoding, how many errors can this code correct? 

5. Show that any two codewords of this code are orthogonal, and in particular any codeword 
is orthogonal to itself. 


7.33 A code C consists of all binary sequences of length 6 and weight 3. 

1. Is this code a linear block code? Why? 

2. What is the rate of this code? What is the minimum distance of this code? What is the 
minimum weight for this code? 

3. If the code is used for error detection, how many errors can it detect? 

4. If the code is used on a binary symmetric channel with crossover probability of p, what 
is the probability that an undetectable error occurs? 

5. Find the smallest linear block code C\ such that C C C\ (by the smallest code we mean 
the code with the fewest codewords). 


7.34 A systematic (6, 3) code has the generator matrix 


G = 


' 1 

0 

0 


0 0 
1 0 
0 1 


1 1 
0 1 
1 0 


0 ' 
1 
1 


Construct the standard array and determine the correctable error patterns and their corre- 
sponding syndromes. 


7.35 Construct the standard array for the (7, 3) code with generator matrix 


G = 


' 1 0 
0 1 
0 0 


0 1 
0 1 
1 0 


0 1 
1 1 
1 1 


1 ■ 
0 
1 


and determine the correctable patterns and their corresponding syndromes. 
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7.36 A (6, 3) systematic linear block code encodes the information sequence x = (jci , X2, * 3 ) 
into codeword c = (ci, C2, c 3, C4, C;, Cf , ), such that C4 is a parity check on cj and C2, to 
make the overall parity even (i.e., Cj © C2 © C4 = 0). Similarly c 5 is a parity check on C2 
and C3, and cy, is a parity check on c 1 and C3. 

1. Determine the generator matrix of this code. 

2. Find the parity check matrix for this code. 

3. Using the parity check matrix, determine the minimum distance of this code. 

4. How many errors is this code capable of correcting? 

5. If the received sequence (using hard decision decoding) is y = 100000, what is the 
transmitted sequence using a maximum-likelihood decoder? (Assume that the crossover 
probability of the channel is less than i.) 

7.37 C is a (6, 3) linear block code whose generator matrix is given by 


1. What rate, minimum distance, and the coding gain can C provide in soft decision 
decoding when BPSK is used over an AWGN channel? 

2. Can you suggest another (6, 3) LBC that can provide a better coding gain? If the answer 
is yes, what is its generator matrix and the resulting coding gain? If the answer is no, 
why? 

3. Suggest a parity check matrix H for C. 

7.38 Prove that if C is MDS, its dual C 1 is also MDS. 

7.39 Let n and t be positive integers such that n > 2t \ hence < i. 

1 . Show that for any X > 0 we have 


fl 1 1 1 0 01 


G = 0 0 1 1 1 1 
111111 



2. Assuming p = t/n in part 1, show that 



3. By choosing X = log 2 x -y- show that 



4. Using Stirling’s approximation that states that 
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where 


12/7 + 1 


< X n < ^ , show that for large n and t such that ^ we have 



2 


7.40 Let C denote an ( n , k ) linear block code with minimum distance rf m j n . 

a. Let C denote a 2 k x n matrix whose rows are all the codewords of C. Show that all 
columns of C have equal weight and this weight is 2 k ~ l . 

b. Conclude that the total weight of the codewords of C is given by 


7.41 Construct an extended (8, 4) code from the (7, 4) Hamming code by specifying the gener- 
ator matrix and the parity check matrix. 

7.42 The polynomial 


is the generator for the (15, 11) Hamming binary code. 

a. Determine a generator matrix G for this code in systematic form. 

b. Determine the generator polynomial for the dual code. 

7.43 For the (7, 4) cyclic Hamming code with generator polynomial g(X ) = X 3 + X 2 + 1, 
construct an (8, 4) extended Hamming code and list all the codewords. What is d m j n for 
the extended code? 

7.44 An(8, 4) linear block code is constructed by shortening a (15, 11) Hamming code generated 
by the generator polynomial g(X) = X 4 + X + 1. 

a. Construct the codewords of the (8, 4) code and list them. 

b. What is the minimum distance of the (8, 4) code? 

7.45 The polynomial X 15 + 1 when factored yields 

X 15 + 1 = (X 4 + X 3 + 1)(X 4 + X 3 + X 2 + X + 1)(X 4 + X + 1)(X 2 + X + 1)(X + 1) 

a. Construct a systematic (15,5) code using the generator polynomial 



m = 1 


c. 


From part (b) conclude that the Plotkin bound 



g(X) = X 4 + X + 1 


g(X) = (X 4 + X 3 + X 2 + X + 1)(X 4 + X + 1)(X 2 + X + 1) 


b. What is the minimum distance of the code? 

c. How many random errors per codeword can be corrected? 

d. How many errors can be detected by this code? 
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e. List the codewords of a (15, 2) code constructed from the generator polynomial 


g(X) 


X 15 + 1 

x 2 + x + \ 


and determine the minimum distance. 


7.46 Construct the parity check matrices H i and H 2 corresponding to the generator matrices 
G 1 and G 2 given by Equations 7.9-12 and 7.9-13, respectively. 

7.47 Determine the correctable error patterns (of least weight) and their syndromes for the 
systematic (7, 4) cyclic Hamming code. 

7.48 Let g(X) = X s + X 6 + X 4 + X 2 + 1 be a polynomial over the binary field. 

a. Find the lowest-rate cyclic code with generator polynomial g(X). What is the rate of 
this code? 

b. Find the minimum distance of the code found in (a). 

c. What is the coding gain for the code found in (a)? 

7.49 The polynomial g(X) = X + 1 over the binary field is considered. 

a. Show that this polynomial can generate a cyclic code for any choice of n. Find the 
corresponding k. 

b. Find the systematic form of G and H for the code generated by g(X). 

c. Can you say what type of code this generator polynomial generates? 

7.50 Design a (6, 2) cyclic code by choosing the shortest possible generator polynomial. 

a. Determine the generator matrix G (in the systematic form) for this code, and find all 
possible codewords. 

b. How many errors can be corrected by this code? 

7.51 Let Ci and C 2 denote two cyclic codes with the same block length n, with generator 
polynomials g< (X) and ei(X), and with minimum distances d 1 and dt, respectively. Define 
C max = C 1 UC 2 andC min = C 1 nC 2 . 

1 . Is C max a cyclic code? Why? If yes, what is its generator polynomial and its minimum 
distance? 

2. Is C m a cyclic code? Why? If yes, find its generator polynomial. What can you say 
about its minimum distance? 


7.52 We know that cyclic codes for all possible values of («, k ) do not exist. 

1. Give an example of an (n, k) pair for which no cyclic code exists ( k < n). 

2. How many (10, 2) cyclic codes do exist? Determine the generator polynomial of one 
such code. 

3. Determine the minimum distance of the code in part 2. 

4. How many errors can the code in part 2 correct? 

5. If this code is employed for transmission over a channel which uses binary antipodal 
signaling with hard decision decoding and the SNR per bit of the channel is >% = 3 dB, 
determine an upper bound on the error probability of the system. 

7.53 What are the possible rates for cyclic codes with block length 23? List all possible generator 

polynomials and specify the generator polynomial of the (23, 12) Golay code. 
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7.54 Let s(X) denote the syndrome corresponding to error sequence e(X) in an ( n , k) cyclic 
code with generator polynomial g(X). Show that the syndrome corresponding to e (1) (X), 
the right cyclic shift of e(X), is s (1) (3Q, defined by 

s (1) (X) = Ws(X) mod g(X) 

7.55 Is the following statement true or false? If it is true, prove it; and if it is false, give a 
counterexample: The minimum weight of a cyclic code is equal to the number of nonzero 
coefficients of its generator polynomial. 

7.56 Determine the generator polynomial and the rate of a double-error-correcting BCH code 
with block length n = 31. 

7.57 In the BCH code designed in Problem 7.56 the received sequence is 

= 0000000000000000000011001001001 
Using the Berlekamp-Massey algorithm, detect the error locations. 

7.58 Solve Problem 7.57 when the received sequence is 

/- = 1110000000000000000011101101001 

7.59 Beginning with a (15, 7) BCH code, construct a shortened (12, 4) code. Give the generator 
matrix for the shortened code. 

7.60 Determine the generator polynomial and the rate of a double-error-correcting Reed- 
Solomon code with block length n = 7. 

7.61 Determine the generator polynomial and the rate of a triple-error-correcting Reed-Solomon 
code with block length n = 63. How many codewords does this code have? 

7.62 What is the weight distribution function of the Reed-Solomon code designed in 
Problem 7.60? 

7.63 Prove that in the product code shown in Figure 7.13-1 the (n \ — k\ ) x («2 — £ 2 ) bits in the 
lower right corner can be obtained as either the parity checks on the rows or parity checks 
on the columns. 

7.64 Prove that the minimum distance of a product code is the product of the minimum distances 
of the two constituent codes. 
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Linear block codes were studied in detail in Chapter 7. These codes are mainly used 
with hard decision decoding that employs the built-in algebraic structure of the code 
based on the properties of finite fields. Hard decision decoding of these codes results in 
a binary symmetric channel model consisting of the binary modulator, the waveform 
channel, and the optimum binary detector. The decoder for these codes tries to find the 
codeword at the minimum Hamming distance from the output of the BSC. The goal in 
designing good linear block codes is to find the code with highest minimum distance 
for a given n and k. 

In this chapter we introduce another class of codes whose structure is more con- 
veniently described in terms of trellises or graphs. We will see that for this family of 
codes, soft decision decoding is possible, and in some cases performance very close to 
channel capacity is achievable. 


■ 8.1 

THE STRUCTURE OF CONVOLUTIONAL CODES 

A convolutional code is generated by passing the information sequence to be transmitted 
through a linear finite-state shift register. In general, the shift register consists of K 
(k-bit) stages and n linear algebraic function generators, as shown in Figure 8.1-1. The 
input data to the encoder, which is assumed to be binary, is shifted into and along the 
shift register k bits at a time. The number of output bits for each /c -hit input sequence is 
n bits. Consequently, the code rate is defined as R, = k/ n, consistent with the definition 
of the code rate for a block code. The parameter K is called the constraint length of 
the convolution code. ' 


tin many cases, the constraint length of the code is given in bits rather than A-bit bytes. Hence, the shift 
register may be called an L-stage shift register, where L = Kk. Furthermore, L may not be a multiple of 
k, in general. 
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FIGURE 8.1-1 

Convolutional encoder. 


One method for describing a convolutional code is to give its generator matrix, just 
as we did for block codes. In general, the generator matrix for a convolutional code 
is semi-inhnite since the input sequence is semi-infinite in length. As an alternative to 
specifying the generator matrix, we shall use a functionally equivalent representation 
in which we specify a set of n vectors, one vector for each of the n modulo-2 adders. 
Each vector has Kk dimensions and contains the connections of the encoder to that 
modulo-2 adder. A 1 in the zth position of the vector indicates that the corresponding 
stage in the shift register is connected to the modulo-2 adder, and a 0 in a given position 
indicates that no connection exists between that stage and the modulo-2 adder. 

To be specific, let us consider the binary convolutional encoder with constraint 
length K = 3, k = 1, and n = 3, which is shown in Figure 8.1-2. Initially, the shift 
register is assumed to be in the all-zeros state. Suppose the first input bit is a 1 . Then the 
output sequence of 3 bits is 111. Suppose the second bit is a 0. The output sequence will 
then be 001. If the third bit is a 1, the output will be 100, and so on. Now, suppose we 
number the outputs of the function generators that generate each 3-bit output sequence 
as 1,2, and 3, from top to bottom, and similarly number each corresponding function 
generator. Then, since only the first stage is connected to the first function generator 
(no modulo-2 adder is needed), the generator is 

g i = [100] 

The second function generator is connected to stages 1 and 3. Hence 

gi — [101] 


Input 


FIGURE 8.1-2 

K = 3, k = 1, 72 = 3 convolutional encoder. 


3 


Output 
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Finally, 


S3 = [1H] 


The generators for this code are more conveniently given in octal form as (4, 5, 7). We 
conclude that when k = 1, we require n generators, each of dimension K to specify 
the encoder. 

It is clear that g 1; go, and g 3 are the impulse responses from the encoder input to 
the three outputs. Then if the input to the encoder is the information sequence u, the 
three outputs are given by 

c (1) = u * g 1 

c i2) = u*go_ ( 8 . 1 - 1 ) 

c (3) = U + go, 


where ★ denotes the convolution operation. The corresponding code sequence c is the 
result of interleaving c n> , c (2) , and c (3) as 


c = 



42 ) 43 ) 41 ) 42 ) 
C 1 1 c \ > c 2 > c 2 > 



( 8 . 1 - 2 ) 


The convolutional operation is equivalent to multiplication in the transform domain. 
We define the D transform ^ of u as 


OO 

u(D ) = ^ m, D‘ 
i = 0 


(8.1-3) 


and the transfer function for the three impulse responses g \ , g 2 , and g 3 as 


81(D) = 1 

g 2 (D)=l+D 2 (8.1-4) 

gz(D) = 1 +D + D 2 
The output transforms are then given by 

c w (D) = u(D)gi(D) 

c (2 \D) = u(D)g 2 (D) (8.1-5) 

c ( 3, (D) = u(D)g 3 (D) 

and the transform of the encoder output c is given by 

c(D) = c w (D 3 ) + Dc ( 2 \D 3 ) + D 2 c°\D 3 ) (8.1-6) 

example 8 . 1 - 1 . Let the sequence u = (1001 1 1) be the input sequence to the convo- 

lutional encoder shown in Figure 8.1-2. We have 

m(D) = 1 + D 3 + D 4 + D 5 


tUsing the D transform is common in coding literature where D denotes the unit delay introduced by 
one memory element in the shift register. By substituting D = z _I , the D transform becomes the familiar 
z transform. 
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FIGURE 8.1-3 

K = 2, k = 2, n = 3 convolutional encoder. 



Output 


and 

c a> (D) = (1 + D 3 + D a + D 5 )( 1) = 1 + D 3 + D 4 + D 5 

c a, (D) = (1 + D 3 + D 4 + D 5 )( 1 + D 2 ) = 1 + D 2 + D 3 + D 4 + D 6 + D 1 

c 0) (D) = (1 + D 3 + D 4 + D 5 )( 1 + D + D 2 ) = 1 + D + D 2 + D 3 + D 5 + D 1 

and 

c(D) = c (1) (D 3 ) + Dc (2) (D 3 ) + £> 2 c (3) (£> 3 ) 

= 1 + D + D 2 + D 5 + D 1 + D* + D 9 + D 10 + D u + D n + D 13 + D 15 
+ D 17 + D 19 + D 22 + D 23 
corresponding to the code sequence 

c= (111001011111110101010011) 

For a rate k/n binary convolutional code with k > 1 and constraint length K , 
the n generators are A'/c-dimcnsional vectors, as stated above. The following example 
illustrates the case in which k = 2 and n = 3. 

example 8 . 1 - 2 . Consider the rate 2/3 convolutional encoder illustrated in Fig- 
ure 8.1-3. In this encoder, 2 bits at a time are shifted into it, and 3 output bits are 

generated. The generators are 

ft = [1011], *2 = [1101], S3 = [1010] 

In octal form, these generators are (13, 15, 12). 

The code shown in Figure 8.1-3 can be also realized by the diagram shown in 
Figure 8.1-4. In this realization, instead a single shift register of length 4, two shift 
registers each of length 2 are employed. The information sequence u is split into two 
substreams w (1) and m ( 2) using a serial-to-parallel converter. Each of the two substreams 



FIGURE 8.1-4 

Double shift register implementation of 
the convolutional encoder shown in 
Figure 8.1-3. 
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is the input to one of the two shift registers. At the output, the three generated sequences 
c (1) , c l2> , and c (3) are interleaved to generate the code sequence c. In general, instead of 
one shift register with length L = Kk, we can use a parallel implementation of k shift 
registers each of length K. 

In the implementation shown in Figure 8.1^1, the encoder has two input sequences 
m ( 1) and m ( 2) and three output sequences c (1 \ c <2> , and c (3) . The encoder thus can be 


described in terms of six impulse responses, 
the D transforms of the impulse responses, 
from input stream m (,) to the output stream 
we have 

g' ] = [01] 
g ? = [1 1 ] 

Si 3) = [0 0] 

and the transfer functions are 

gx\D) = D 
g?{D) = 1 +D 
g?\D ) = 0 


and hence six transfer functions which are 
If we denote by g\ J> the impulse response 
in the encoder depicted in Figure 8.1-4 

g? = [1 1 ] 

g? = [1 0] (8.1-7) 

gf = [H] 

g^\D) = 1 + D 

gf(D)= 1 (8.1-8) 

gi\D) = 1 + D 


From the transfer functions and the I) transform of the input sequences we obtain 
the D transform of the three output sequences as 


c a \D) = u m (D)g\ l \D ) + u (2) (D)g^(D) 
c ( 2 \D) = u a \D)gf\D) + u (2 \D)g?\D) 
c ( 3 \D) = u m (D)gf\D ) + u (2) (D)gf{D) 

and finally 

c{D) = c (1) (D 3 ) + Dc { 2 \D 3 ) + D 2 c 0 ) (D 3 ) 


Equation 8.1-9 can be written in a more compact way by defining 
u(D) = [u m (D) uP-XD )] 

and 

rmt=K )(D) ^ 2,(D) g ' )(D) 

U\D) £\D) gf\D) 

By these definitions Equation 8.1-9 can be written as 


(8.1-9) 


( 8 . 1 - 10 ) 

( 8 . 1 - 11 ) 

(8.1-12) 


where 


c(D) = u(D)G(D ) 
c(D) = [c (1) (D) c ( 2 ) (D) c ( 3) (D)] 


(8.1-13) 

(8.1-14) 


In general, matrix G( D) is a k x n matrix whose elements are polynomials in 
D with degree at most K — 1 . This matrix is called the transform domain generator 
matrix of the convolutional code. For the code whose encoder is shown in Figure 8.1-4 
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we have 


G(D) = 


D 

1 +D 


1 + D 
1 


0 

1 + D 


and for the convolutional code shown in Figure 8. 1-2 we have 
G(D) = [ 1 D 2 + 1 D 2 + D + 1 ] 


(8.1-15) 

(8.1-16) 


8.1-1 Tree, Trellis, and State Diagrams 


There are three alternative methods that are often used to describe a convolutional code. 
These are the tree diagram, the trellis diagram, and the state diagram. For example, 
the tree diagram for the convolutional encoder shown in Figure 8.1-2 is illustrated in 
Figure 8.1-5. Assuming that the encoder is in the all-zeros state initially, the diagram 
shows that if the first input bit is a 0, the output sequence is 000 and if the first bit is a 1 , 
the output sequence is 111. Now, if the first input bit is a 1 and the second bit is a 0, the 
second set of 3 output bits is 001 . Continuing through the tree, we see that if the third bit 
is a 0, then the output is 01 1 , while if the third bit is a 1 , then the output is 100. Given that 
a particular sequence has taken us to a particular node in the tree, the branching rule is 
to follow the upper branch if the next input bit is a 0 and the lower branch if the bit is a 1 . 
Thus, we trace a particular path through the tree that is determined by the input sequence. 

Close observation of the tree that is generated by the convolutional encoder shown 
in Figure 8.1-5 reveals that the structure repeats itself after the third stage. This behavior 
is consistent with the fact that the constraint length K = 3. That is, the 3-bit output 
sequence at each stage is determined by the input bit and the 2 previous input bits, i.e., 
the 2 bits contained in the first two stages of the shift register. The bit in the last stage of 
the shift register is shifted out at the right and does not affect the output. Thus we may 
say that the 3-bit output sequence for each input bit is determined by the input bit and 
the four possible states of the shift register, denoted as a = 00, b = 0\, c = 10, d = 11. 


000 



FIGURE 8.1-5 

Tree diagram for rate 1/3, K = 3 convolutional code. 
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101 * 101 101 


Steady state 

FIGURE 8.1-6 

Trellis diagram for rate 1/3, K = 3 convolutional code. 

If we label each node in the tree to correspond to the four possible states in the shift 
register, we find that at the third stage there are two nodes with label a, two with label 
b, two with label c, and two with label d. Now we observe that all branches emanating 
from two nodes having the same label (same state) are identical in the sense that they 
generate identical output sequences. This means that the two nodes having the same 
label can be merged. If we do this to the tree shown in Figure 8.1-5, we obtain another 
diagram, which is more compact, namely, a trellis. For example, the trellis diagram for 
the convolutional encoder of Figure 8.1-2 is shown in Figure 8.1-6. In drawing this 
diagram, we use the convention that a solid line denotes the output generated by the 
input bit 0 and a dotted line the output generated by the input bit 1 . In the example being 
considered, we observe that, after the initial transient, the trellis contains four nodes at 
each stage, corresponding to the four states of the shift register, a,b, c, and d. After the 
second stage, each node in the trellis has two incoming paths and two outgoing paths. 
Of the two outgoing paths, one corresponds to the input bit 0 and the other to the path 
followed if the input bit is a 1 . 

Since the output of the encoder is determined by the input and the state of the 
encoder, an even more compact diagram than the trellis is the state diagram. The 
state diagram is simply a graph of the possible states of the encoder and the possible 
transitions from one state to another. For example, the state diagram for the encoder 
shown in Figure 8.1-2 is illustrated in Figure 8.1-7. This diagram shows that the 
possible transitions are 

0 1,0 ,1 0 1 j j 0 / j 1 j 

a — >a, a — >c, b — >a, b — >c, c — >b, c — >d, d — >b, d — >d 

where denotes the transition from state a to ft when the input bit is a 1. The 

3 bits shown next to each branch in the state diagram represent the output bits. A dotted 
line in the graph indicates that the input bit is a 1, while the solid line indicates that the 
input bit is a 0. 

example 8.1-3. Let us consider the k = 2, rate 2/3 convolutional code described in 
Example 8.1-2 and shown in Figure 8.1-3. The first two input bits may be 00, 01, 10, 
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FIGURE 8.1-7 

State diagram for rate 1/3, K = 3 
convolutional code. 


or 1 1 . The corresponding output bits are 000, 010, 111, 101. When the next pair of input 
bits enters the encoder, the first pair is shifted to the second stage. The corresponding 
output bits depend on the pair of bits shifted into the second stage and the new pair 
of input bits. Hence, the tree diagram for this code, shown in Figure 8.1-8, has four 
branches per node, corresponding to the four possible pairs of input symbols. 
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FIGURE 8.1-8 

Tree diagram for K =2, k = 2, n = 3 
convolutional code. 
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d 110 d 110 d 110 d 


FIGURE 8.1-9 

Trellis diagram for A" = 2, k = 2, n = 3 convolutional code. 


Since the constraint length of the code is K = 2, the tree begins to repeat after 
the second stage. As illustrated in Figure 8.1-8, all the branches emanating from nodes 
labeled a (state a ) yield identical outputs. 

By merging the nodes having identical labels, we obtain the trellis, which is shown 
in Figure 8.1-9. Finally, the state diagram for this code is shown in Figure 8.1-10. 

To generalize, we state that a rate k/n, constraint length K, convolutional code is 
characterized by 2 k branches emanating from each node of the tree diagram. The trellis 
and the state diagrams each have 2 k(K ~ l) possible states. There are 2 k branches entering 
each state and 2 k branches leaving each state (in the trellis and tree, this is true after the 
initial transient). The three types of diagrams described above are also used to represent 
nonbinary convolutional codes. When the number of symbols in the code alphabet is 
q = 2 k , k > 1, the resulting nonbinary code may also be represented as an equivalent 
binary code. The following example considers a convolutional code of this type. 

example 8.1-4. Let us consider the convolutional code generated by the encoder 
shown in Figure 8.1-11. This code may be described as a binary convolutional code 
with parameters K = 2, k — 2, n = 4. R, = 1/2 and having the generators 

= [1010], g 2 = [0101], *3 = [1110], *4 = [1001] 

Except for the difference in rate, this code is similar in form to the rate 2/3, k = 2 
convolutional code considered in Example 8.1-2. Alternatively, the code generated by 
the encoder in Figure 8.1-1 1 may be described as a nonbinary ( q — 4) code with one 
quaternary symbol as an input and two quaternary symbols as an output. In fact, if the 
output of the encoder is treated by the modulator and demodulator as g-ary ( q = 4) 
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FIGURE 8.1-10 

State diagram for K = 2, k = 2, n = 3 convolutional code. 

symbols that are transmitted over the channel by means of some M - ary (M = 4) 
modulation technique, the code is appropriately viewed as nonbinary. In any case, the 
tree, the trellis, and the state diagrams are independent of how we view the code. That 
is, this particular code is characterized by a tree with four branches emanating from 
each node, or a trellis with four possible states and four branches entering and leaving 
each state, or, equivalently, by a state diagram having the same parameters as the trellis. 


8.1-2 The Transfer Function of a Convolutional Code 

We have seen in Section 7.2-3 that the distance properties of block codes can be 
expressed in terms of the weight distribution, or weight enumeration polynomial of 



o l FIGURE 8.1-11 

O 2 

K = 2, k = 2, n = 4 convolutional 
encoder. 
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the code. The weight distribution polynomial can be used to find performance bounds 
for linear block codes as given by Equations 7.2-39, 7.2-48, 7.4—4, and 7.5-17. The 
distance properties and the error rate performance of a convolutional code can be 
similarly obtained from its state diagram. Since a convolutional code is linear, the set of 
Hamming distances of the code sequences generated up to some stage in the tree, from 
the all-zero code sequence, is the same as the set of distances of the code sequences 
with respect to any other code sequence. Consequently, we assume without loss of 
generality that the all-zero code sequence is the input to the encoder. Therefore, instead 
of studying distance properties of the code we will study the weight distribution of the 
code, as we did for the case of block codes. 

The state diagram shown in Figure 8.1-7 will be used to demonstrate the method 
for obtaining the distance properties of a convolutional code. We assume that the 
all-zero sequence is transmitted, and we focus on error events corresponding to a 
departure from the all-zero path on the code trellis and returning to it for the first 
time. 

First, we label the branches of the state diagram as Z° = 1, Z 1 , Z 2 , or Z 3 , where 
the exponent of Z denotes the Hamming distance between the sequence of output bits 
corresponding to each branch and the sequence of output bits corresponding to the 
all-zero branch. The self-loop at node a can be eliminated, since it contributes nothing 
to the distance properties of a code sequence relative to the all-zero code sequence 
and does not represent a departure from the all-zero sequence. Furthermore, node a is 
split into two nodes, one of which represents the input and the other the output of the 
state diagram, corresponding to the departure from the all-zero path and returning to it 
for the first time. Figure 8.1-12 illustrates the resulting diagram. We use this diagram, 
which now consists of five nodes because node a was split into two, to write the four 
state equations 



x d = Z 2 X C + Z 2 X d 
x e = Z 2 X b 


(8.1-17) 



a 


c 


Z 


b 


Z‘ 


e 


FIGURE 8.1-12 

State diagram for rate 1/3, K = 3 convolutional code. 
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The transfer function for the code is defined as T{Z) = X e /X a . By solving the 
state equations given above, we obtain 


nz) = 


z 6 

1 — 2Z 2 


= Z 6 + 2Z 8 + 4Z 10 + 8Z 12 + • • • 


= J2 a * zd 


d = 6 


(8.1-18) 


where, by definition, 


a d 


2(d 6)/2 eyen J 

0 odd d 


(8.1-19) 


The transfer function for this code indicates that there is a single path of Hamming 
distance d = 6 from the all-zero path that merges with the all-zero path at a given 
node. From the state diagram shown in Figure 8.1-7 or the trellis diagram shown in 
Figure 8.1-6, it is observed that the d = 6 path is ache. There is no other path from node 
a to node e having a distance d = 6. The second term in Equation 8. 1-18 indicates that 
there are two paths from node a to node e having a distance d = 8. Again, from the state 
diagram or the trellis, we observe that these paths are acdbe and acbcbe. The third term 
in Equation 8.1-18 indicates that there are four paths of distance d = 10, and so forth. 
Thus the transfer function gives us the distance properties of the convolutional code. 
The minimum distance of the code is called the minimum free distance and denoted by 
c/free- In our example, df ree = 6. 

The transfer function T{Z) introduced above is similar to the the weight enumera- 
tion function (WEF) A(Z) for block codes introduced in Chapter 7. The main difference 
is that in the transfer function of a convolutional code the term corresponding to the 
loop at the all-zero state is eliminated; hence the all-zero code sequence is not included, 
and therefore the lowest power in the transfer function is df ree . In determining A(Z) 
we include the all-zero codeword, hence A(Z) always contains a constant equal to 1. 
Another difference is that in determining the transfer function of a convolutional code, 
we consider only paths in the trellis that depart from the all-zero state and return to it 
for the first time. Such a path is called a first event error and is used to bound the error 
probability of convolutional codes. 

The transfer function can be used to provide more detailed information than just 
the distance of the various paths. Suppose we introduce a factor Y into all branch 
transitions caused by the input bit 1. Thus, as each branch is traversed, the cumulative 
exponent on Y increases by 1 only if that branch transition is due to an input bit 1. 
Furthermore, we introduce a factor of J into each branch of the state diagram so that 
the exponent of J will serve as a counting variable to indicate the number of branches 
in any given path from node a to node e. For the rate 1 /3 convolutional code in our 
example, the state diagram that incorporates the additional factors of J and Y is shown 
in Figure 8.1-13. 
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* 2' 

1 JYZ 2 1 
1 / 



FIGURE 8.1-13 

State diagram for rate 1/3, K = 3 convolutional code. 


The state equations for the state diagram shown in Figure 8.1-13 are 

X c = JYZ 3 X a + JYZX b 
X b = JZX c + JZX d 
X d = JYZ 2 X c + JYZ 2 X d 
X e = JZ 2 X b 


Upon solving these equations for the ratio X e /X a , we obtain the transfer function 


J 2 YZ 6 

T(Y , Z, J) = 

1 - JYZ 2 ( 1 + J ) 

= 7 3 FZ 6 + 7 4 F 2 Z 8 + 7 5 F 2 Z 8 + 7 5 F 3 Z 10 
+ 27 6 F 3 Z 10 + T 7 F 3 Z 10 + • • • 


( 8 . 1 - 21 ) 


This form for the transfer functions gives the properties of all the paths in the 
convolutional code. That is, the first term in the expansion of T(Y, Z, J ) indicates that 
the distance d = 6 path is of length 3 and of the three information bits, one is a 1. The 
second and third terms in the expansion of T(Y . Z, J ) indicate that of the two d = 8 
terms, one is of length 4 and the second has length 5. Two of the four information 
bits in the path having length 4 and two of the five information bits in the path having 
length 5 are Is. Thus, the exponent of the factor J indicates the length of the path that 
merges with the all-zero path for the first time, the exponent of the factor F indicates the 
number of Is in the information sequence for that path, and the exponent of Z indicates 
the distance of the sequence of encoded bits for that path from the all-zero sequence 
(the weight of the code sequence). 

The factor J is particularly important if we are transmitting a sequence of finite 
duration, say m bits. In such a case, the convolutional code is truncated after m nodes 
or m branches. This implies that the transfer function for the truncated code is obtained 
by truncating T(Y, Z, J ) at the term J"’. On the other hand, if we are transmitting an 
extremely long sequence, i.e., essentially an infinite-length sequence, we may wish to 
suppress the dependence of T(F, Z, J) on the parameter J. This is easily accomplished 
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by setting 7 = 1. Hence, for the example given above, we have 

yz 6 

T(Y, Z ) = T(Y, Z, 1) = X _ 2YZ2 

= yz 6 + 2y 2 z 8 + 4y 3 z 10 H — 

oo 

= J2 a dY (d ~ 4)/2 Z d 

d = 6 


( 8 . 1 - 22 ) 


where the coefficients {a,/} are defined by Equation 8. 1-19. The reader should note the 
similarity between T ( Y , Z) and B(Y, Z) introduced in Equation 7.2-25, Section 7.2-3. 

The procedure outlined above for determining the transfer function of a binary 
convolutional code can be applied easily to simple codes with few number of states. 
For a general procedure for finding the transfer function of a convolutional code based 
on application of Mason’s rule for deriving transfer function of flow graphs, the reader 
is referred to Lin and Costello (2004). 

The procedure outlined above can be easily extended to nonbinary codes. In the 
following example, we determine the transfer function of the nonbinary convolutional 
code previously introduced in Example 8.1-4. 

example 8.1-5. The convolutional code shown in Figure 8.1-11 has the parameters 
K = 2, k = 2, n = 4. In this example, we have a choice of how we label distances 
and count errors, depending on whether we treat the code as binary or nonbinary. 
Suppose we treat the code as nonbinary. Thus, the input to the encoder and the output 
are treated as quaternary symbols. In particular, if we treat the input and output as 
quaternary symbols 00, 01, 10, and 11, the distance measured in symbols between the 
sequences 0111 and 0000 is 2. Furthermore, suppose that an input symbol 00 is decoded 
as the symbol 1 1 ; then we have made one symbol error. This convention applied to the 
convolutional code shown in Figure 8.1-1 1 results in the state diagram illustrated in 
Figure 8.1-14, from which we obtain the state equations 


X h = Y J Z~X a + YJZX b + YJZX C + Y JZ 2 X d 
X c = Y JZ 2 X a + Y JZ 2 X b + YJZX C + YJZX d 
X d = Y JZ 2 X a + YJZX b + YJZ 2 X C + YJZX d 
X c = JZ 2 {X b + Z c + X d ) 


Solution of these equations leads to the transfer function 


T(Y, Z, 7) = 


3YJ 2 Z 4 

1 - 2YJZ - YJZ 2 


(8.1-23) 


(8.1-24) 


This expression for the transfer function is particularly appropriate when the quaternary 
symbols at the output of the encoder are mapped into a corresponding set of quaternary 
waveforms s m (t),m = 1 , 2, 3 , 4, e.g., four orthogonal waveforms. Thus, there is a one- 
to-one correspondence between code symbols and signal waveforms. Alternatively, for 
example, the output of the encoder may be transmitted as a sequence of binary digits 
by means of binary PSK. In such a case, it is appropriate to measure distance in terms 
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JYZ 



JYZ 


FIGURE 8.1-14 

State diagram for K = 2, k = 2, rate 1 /2 nonbinary code. 


of bits. When this convention is employed, the state diagram is labeled as shown in 
Figure 8.1-15. Solution of the state equations obtained from this state diagram yields 
a transfer function that is different from the one given in Equation 8.1-9. 

8.1-3 Systematic, Nonrecursive, and Recursive Convolutional Codes 

A convolutional code in which the information sequence directly appears as part of 
the code sequence is called systematic. For instance the convolutional encoder given in 
Figure 8.1-2 depicts the encoder for a systematic convolutional code since 

c {l) = u-kg\=u (8.1-25) 

This shows that the information sequence u appears as part of the code sequence c. 
This can be directly seen by observing that the transform domain generator matrix of 
the code given in Equation 8.1-16 has a 1 in its first column. 

In general, if G(D) is of the form 


G(D) = [I k | P(D)] 


(8.1-26) 
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JYZ 2 

( 0011 ) 



( 0010 ) 

jy 2 z 


FIGURE 8.1-15 

State diagram for K = 2, k = 2, rate 1 /2 convolutional code with output treated as a binary 
sequence. 


where P(D ) is a k x (n — k) polynomial matrix, the convolutional code is systematic. 
The matrix G(D) given below corresponds to a systematic convolutional code with 


n = 3 and k = 2. 


G(D) = 


1 

0 


0 

1 


1 +D 

1 + D + D 2 


(8.1-27) 


Two convolutional encoders are called equivalent if the code sequences generated 
by them are the same. Note that in the definition of equivalent convolutional encoders 
it is sufficient that the code sequences be the same; it is not required that the equal code 
sequences correspond to the same information sequences. 

example 8 . 1 - 6 . A convolutional code with n = 3 and k = 1 is described by 

G(D)= [l + D + D 2 1 + D D] (8.1-28) 


The code sequences generated by this encoder are sequences of the general form 

c(D) = c (1) (D 3 ) + Dc (2) (D 3 ) + D 2 c 0) (D 3 ) (8. 1-29) 
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where 

c a) (£>) = (1 + D + D 2 )u(D) 
c (2) (D ) = (1 + D)u(D) 
c 0) {D ) = Du(D) 
or 

c(D) = (1 + D + D 3 + D 4 + D 5 + D 6 )u(D 3 ) 

The matrix G(D) can also be written as 
G(D) = (1 + D + D 2 )[l 

= (1 + D + D 2 )G\D) 

G{D) and G'(D) are equivalent encoders, meaning that these two matrices generate the 
same set of code sequences; However, these code sequences correspond to different 
information sequences. Also note that G'{D) represents a systematic convolutional 
code. 

It is easy to verify that the information sequences u — (1, 0, 0, 0, 0, ... ) and 
u' — (1, 1, 1, 0, 0, 0, 0, ... ) when applied to encoders G{D) and G'(D), respectively, 
generate the same code sequence 

c = (1, 1,0, 1, 1, 1, 1,0, 0,0,0,...) 

The transform domain generator matrix G'{D) given by 

G\D)=[l j+D+jy- TTTTTzr] (8-1-33) 

represents a convolutional encoder with feedback. To realize this transfer function, we 
need to use shift registers with feedback as shown in Figure 8.1-16. 

Convolutional codes that are realized using feedback shift registers are called re- 
cursive convolutional codes (RCCs). The transform domain generator matrix for these 
codes includes ratios of polynomials whereas in the case of nonrecursive convolutional 
codes the elements of G{D) are polynomials. Note that in recursive convolutional codes 
the existence of feedback causes the code to have infinite-length impulse responses. 

Although systematic convolutional codes are desirable, unfortunately, in general 
systematic nonrecursive convolutional codes cannot achieve the highest free distance 
possible with nonsystematic nonrecursive convolutional codes of the same rate and 
constraint length. Recursive systematic convolutional codes, however, can achieve the 


(8.1-30) 

(8.1-31) 

(8.1-32) 




0 

0 


K0 


FIGURE 8.1-16 

Realization of G'(D ) using feedback shift register. 


508 


Digital Communications 


same free distance as nonrecursive systematic codes for a given rate and constraint 
length. The code depicted in Figure 8.1-16 is a recursive systematic convolutional 
code (RSCC). Such codes are essential parts of turbo codes as discussed in Section 8.9. 


8.1-4 The Inverse of a Convolutional Encoder and Catastrophic Codes 


One desirable property of a convolutional encoder is that in the absence of noise it 
is possible to recover the information sequence from the encoded sequence. In other 
words it is desirable that the encoding process be invertible. Clearly, any systematic 
convolutional code is invertible. 

In addition to invertibility, it is desirable that the inverse of the encoder be realizable 
using a feedforward network. The reason is that if in transmission of c(D ) one error 
occurs and the inverse function is a feedback circuit having an infinite impulse response, 
then this single error, which is equivalent to an impulse, causes an infinite number of 
errors to occur at the output. 

For a nonsystematic convolutional code, there exists a one-to-one correspon- 
dence between c(D) and c (1, (D), c (2) (D), ...,c (n) {D) and also between u(D ) and 
u m (D), u^ 2 \D), . . . , u^ k \D). Therefore, to be able to recover u(D) from c(D), we 
have to be able to recover u m {D), u { 2 \D), . . . , u (k \D) from c (1) (D), c (2) (jD), . . . , 
c (n \D). Using the relation 

c(D) = u(D)G(D) (8.1-34) 

we conclude that the code is invertible if G(D) is invertible. Therefore the condition 
for invertibility of a convolutional code is that for the k x n matrix G(D) there must 
exist an n x k inverse matrix G l ( D) such that 

G(D)G~\D) = D 1 1 k (8.1-35) 

where l > 0 is an integer representing a delay of / time units between the input and the 
output. 

The following result due to Massey and Sain (1968) provides the necessary and 
sufficient condition under which a feedforward inverse for G( D) exists. 

An ( n , k) convolutional code with 


G(D) = [ gl (D) g 2 (D ) ••• g„(D)] (8.1-36) 

has a feedforward inverse with delay l if and only if for some / > 0 we have 

GCD {#,(£>), 1 <i <k} = D l (8.1-37) 

where GCD denotes the greatest common divisor. For ( n , k) convolutional codes the 
condition is 


GCD 


A, CD), 1 < i < 



= D 1 


(8.1-38) 


where A, (D), 1 < i < (") denote the determinants of the Q) distinct k x k submatrices 
of G(D). 
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c (1 > 



FIGURE 8.1-17 

A catastrophic convolutional encoder. 


Convolutional codes for which a feedforward inverse does not exist are called 
catastrophic convolutional codes. When a catastrophic convolutional code is used on a 
binary symmetric channel, it is possible for a finite number of channel errors to cause an 
infinite number of decoding errors. For simple codes, such a code can be identified from 
its state diagram. It will contain a zero-distance path (a path with multiplier D° = 1) 
from some nonzero state back to the same state. This means that one can loop around this 
zero-distance path an infinite number of times without increasing the distance relative to 
the all-zero path. But, if this self-loop corresponds to the transmission of a 1 , the decoder 
will make an infinite number of errors. For general convolutional codes, conditions given 
in Equations 8.1-37 and 8.1-38 must be satisfied for the code to be noncatastrophic. 

example 8.1-7. Consider the k = l, n = 2, K = 3 convolutional code shown in 
Figure 8.1-17. For this code G(D) is given by 

G(D)=[l + D 1 + D 2 ] (8.1-39) 

and since GCD{1 + 0,1 + D 2 } = 1 + D ^ D l , the code is catastrophic. The state 
diagram for this code is shown in Figure 8.1-18. The existence of the self-loop from 
state 1 1 to itself corresponding to an input sequence of weight 1 and output sequence 
of weight 0 results in catastrophic behavior for this code. 



FIGURE 8.1-18 

The state diagram for the catastrophic code of Figure 8.1-17. 
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There exist different methods for decoding of convolutional codes. Similar to block 
codes, the decoding of convolutional codes can be done either by soft decision or by hard 
decision decoding. In addition, the optimal decoding of convolutional codes can employ 
the maximum-likelihood or the maximum a posteriori principle. For convolutional 
codes with high constraint lengths, optimal decoding algorithms become too complex. 
Suboptimal decoding algorithms are usually used in such cases. 


8.2-1 Maximum-Likelihood Decoding of Convolutional 
Codes — The Viterbi Algorithm 

In the decoding of a block code for a memoryless channel, we computed the distances 
(Hamming distance for hard-decision decoding and Euclidean distance for soft-decision 
decoding) between the received codeword and the 2 k possible transmitted codewords. 
Then we selected the codeword that was closest in distance to the received codeword. 
This decision rule, which requires the computation of 2 k metrics, is optimum in the 
sense that it results in a minimum probability of error for the binary symmetric channel 
with p < 5 and the additive white Gaussian noise channel. 

Unlike a block code, which has a fixed length n, a convolutional encoder is basically 
a finite-state machine. Hence the optimum decoder is a maximum-likelihood sequence 
estimator (MLSE) of the type described in Section 4.8-1 for signals with memory. 
Therefore, optimum decoding of a convolutional code involves a search through the 
trellis for the most probable sequence. Depending on whether the detector following 
the demodulator performs hard or soft decisions, the corresponding metric in the trel- 
lis search may be either a Hamming metric or a Euclidean metric, respectively. We 
elaborate below, using the trellis in Figure 8.1-6 for the convolutional code shown in 
Figure 8.1-2. 

Consider the two paths in the trellis that begin at the initial state a and remerge at 
state a after three state transitions (three branches), corresponding to the two informa- 
tion sequences 000 and 100 and the transmitted sequences 000 000 000 and 111 001 
Oil, respectively. We denote the transmitted bits by {cj m , j = 1, 2, 3; m = 1,2, 3}, 
where the index j indicates the jth branch and the index m the /nth bit in that branch. 
Correspondingly, we define {/- /m , j = 1,2, 3; m = 1, 2, 3} as the output of the de- 
modulator. If the decoder performs hard decision decoding, the detector output for 
each transmitted bit is either 0 or 1 . On the other hand, if soft decision decoding is 
employed and the coded sequence is transmitted by binary coherent PSK, the input to 
the decoder is 


fjm — \/ £ c (2c j m 1 ) ftjm ( 8.2 1 ) 

where n Jln represents the additive noise and £ c is the transmitted signal energy for each 
code bit. 
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A metric is defined for the /' th branch of the / th path through the trellis as the 
logarithm of the joint probability of the sequence {r /m , m = 1, 2, 3} conditioned on the 
transmitted sequence { , rn = 1, 2, 3} for the /th path. That is, 

P-f = log p(rj\cf), j = 1,2,3,... (8.2-2) 

Furthermore, a metric for the /th path consisting of B branches through the trellis is 
defined as 


B 

PM® = pf (8-2-3) 

j = i 

The criterion for deciding between two paths through the trellis is to select the one 
having the larger metric. This rule maximizes the probability of a correct decision, or, 
equivalently, it minimizes the probability of error for the sequence of information bits. 
For example, suppose that hard decision decoding is performed by the demodulator, 
yielding the received sequence {101 000 100}. Let / = 0 denote the three-branch all- 
zero path and / = 1 the second three -branch path that begins in the initial state a and 
remerges with the all-zero path at state a after three transitions. The metrics for these 
two paths are 


PM (0) = 6 log(l - p) + 3 log p 
PM W = 41og(l - p) + 5 log p 


(8.2-4) 


where p is the probability of a bit error. Assuming that p < we find that the metric 
PM® is larger than the metric PM <V) . This result is consistent with the observation that 
the all-zero path is at Flamming distance d = 3 from the received sequence, while the 
i = 1 path is at Hamming distance d = 5 from the received path. Thus, the Hamming 
distance is an equivalent metric for hard decision decoding. 

Similarly, suppose that soft decision decoding is employed and the channel adds 
white Gaussian noise to the signal. Then the demodulator output is described statistically 
by the probability density function 


P( 


r- | c (i) 


) = 


V2 


: exp * 


7T(7 Z 


[ r jmc - VS (2c ( jl 
2a 2 



(8.2-5) 


where a 2 = \Nois the variance of the additive Gaussian noise. If we neglect the terms 
that are common to all branch metrics, the branch metric for the jth branch of the /th 
path may be expressed as 


rif = J2 r J m ( 2c % - 0 


( 8 . 2 - 6 ) 
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where, in our example, n = 3. Thus the correlation metrics for the two paths under 
consideration are 

cw <P) = EE^»*( 2c S- 1 ) 

/=1 m= 1 

(8.2-7) 

7=1 «=1 

From the above discussion it is observed that for ML decoding we need to look for 
a code sequence c (m) in the trellis T that satisfies 


,(m) = max ^ log p{r j |c ; ), 
j 

for a general memoryless channel 


’ (m) = min \\r ; — c; 1 2 , 

ceT ^ 11 J J 11 

j 

for soft decision decoding 

(8.2-8) 

' (m) = min dn(y j ,Cj), 

ceT Z — J 

for hard decision decoding 



j 


Note that for hard decision decoding y denotes the result of binary (hard) decisions 
on the demodulator output r. Also in the hard decision case, c denotes the binary 
encoded sequence whose components are 0 and 1 , whereas in the soft decision case the 
components of c are ±yf£c. What is clear from above is that in all cases maximum- 
likelihood decoding requires finding a path in the trellis that minimizes or maximizes 
an additive metric. This is done by using the Viterbi algorithm as discussed below. 

We consider the two paths described above, which merge at state a after three 
transitions. Note that any particular path through the trellis that stems from this node 
will add identical terms to the path metrics CM (0) and CM ' 1 Consequently, if CM (0) > 
CM (1) at the merged node a after three transitions, CM {()) will continue to be larger than 
CM (1) for any path that stems from node a. This means that the path corresponding 
to CM (1) can be discarded from further consideration. The path corresponding to the 
metric is the survivor. Similarly, one of the two paths that merge at state b can be 
eliminated on the basis of the two corresponding metrics. This procedure is repeated at 
state c and state d. As a result, after the first three transitions, there are four surviving 
paths, one terminating at each state, and a corresponding metric for each survivor. 
This procedure is repeated at each stage of the trellis as new signals are received in 
subsequent time intervals. 

In general, when a binary convolutional code with k = 1 and constraint length 
K is decoded by means of the Viterbi algorithm, there are 2 K ~ l states. Hence, there 
are 2 K ~ l surviving paths at each stage and 2 K ~ X metrics, one for each surviving path. 
Furthermore, a binary convolutional code in which k bits at a time are shifted into 
an encoder that consists of K (/c-bit) shift-register stages generates a trellis that has 
2 k(K _l) states. Consequently, the decoding of such a code by means of the Viterbi 
algorithm requires keeping track of 2 k<K l; surviving paths and 2 k(K l( metrics. At 
each stage of the trellis, there are 2 k paths that merge at each node. Since each path 
that converges at a common node requires the computation of a metric, there are 
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2 k metrics computed for each node. Of the 2 k paths that merge at each node, only 
one survives, and this is the most probable (minimum-distance) path. Thus, the number 
of computations in decoding performed at each stage increases exponentially with k 
and K. The exponential increase in computational burden limits the use of the Viterbi 
algorithm to relatively small values of K and k. 

The decoding delay in decoding a long information sequence that has been con- 
volutionally encoded is usually too long for most practical applications. Moreover, the 
memory required to store the entire length of surviving sequences is large and expen- 
sive. As indicated in Section 4.8-1, a solution to this problem is to modify the Viterbi 
algorithm in a way which results in a fixed decoding delay without significantly affect- 
ing the optimal performance of the algorithm. Recall that the modification is to retain 
at any given time t only the most recent 8 decoded information bits (symbols) in each 
surviving sequence. As each new information bit (symbol) is received, a final decision 
is made on the bit (symbol) received 8 branches back in the trellis, by comparing the 
metrics in the surviving sequences and deciding in favor of the bit in the sequence 
having the largest metric. If 8 is chosen sufficiently large, all surviving sequences will 
contain the identical decoded bit (symbol) 8 branches back in time. That is, with high 
probability, all surviving sequences at time t stem from the same node at t — 8. It has 
been found experimentally (computer simulation) that a delay 8 > 5K results in a 
negligible degradation in the performance relative to the optimum Viterbi algorithm. 


8.2-2 Probability of Error for Maximum-Likelihood Decoding 
of Convolutional Codes 

In deriving the probability of error for convolutional codes, the linearity property for 
this class of codes is employed to simplify the derivation. That is, we assume that the 
all-zero sequence is transmitted, and we determine the probability of error in deciding 
in favor of another sequence. 

Since the convolutional code does not necessarily have a fixed length, we derive 
its performance from the probability of error for sequences that merge with the all-zero 
sequence for the first time at a given node in the trellis. In particular, we define the 
first-event error probability as the probability that another path that merges with the 
all-zero path at node B has a metric that exceeds the metric of the all-zero path for 
the first time. Of course in transmission of convolutional codes, other types of errors 
can occur; but it can be shown that bounding the error probability of the convolutional 
code by the sum of first-event error probabilities provides an upper bound that, although 
conservative, in most cases is a usable bound on the error probability. The interested 
user can refer to the book by Lin and Costello (2004) for details. 

As we have previously discussed in Section 8.1-2, the transfer function of a con- 
volutional code is similar to the WEF of a block code with two differences. First, it 
considers only the first-event errors; and second, it does not include the all-zero code 
sequence. Therefore, parallel to the argument we presented for block codes in Sec- 
tion 7.2-4, we can derive bounds on sequence and bit error probability of convolutional 
codes. 
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The sequence error probability of a convolutional code is bounded by 

P e < T(Z ) | (8.2-9) 

I Z=A 

where 

a = ^ Vp(y\ 0 )p(y\D ( 8 . 2 - 10 ) 

y&S/ 

Note that unlike Equation 7.2-39, which states in linear block codes P e < /HA) — 1, 
here we do not need to subtract 1 from T(Z) since T(Z) does not include the all-zero 
path. Equation 8.2-9 can be written as 

OO 

Pe<J2 ad A f/ (8.2-11) 

d=df ree 

The bit error probability for a convolutional code follows from Equation 7.2-48 as 
P b <-^-T{Y,Z)\ (8.2-12) 

k dY | Y~i,z=A 

From Example 6.8-1 we know that if the modulation is BPSK (or QPSK) and the 
channel is an AWGN channel with soft decision decoding, then 

A = e~ RcVb (8.2-13) 


and in case of hard decision decoding, where the channel model is a binary symmetric 
channel with crossover probability of p, we have 

A = sjApil - p ) (8.2-14) 


Therefore, we have the following upper bounds for the bit error probability of a con- 
volutional code: 


Pb < 


' t 
k 

< 1 
k 


iyT{Y , Z) 
&T{Y, Z) 


y = 1 , Z=exp (—R c Yb ) 


y=i,z=V4p<i-p) 


BPSK with soft decision decoding 
hard decision decoding 


(8.2-15) 


In hard decision decoding we can employ direct expressions for the pairwise error 
probability instead of using the Bhatacharyya bound. This results in tighter bounds on 
the error probability. The probability of selecting a path of weight d, when d is odd, 
over the all-zero path is the probability that the number of errors at these locations is 
greater than or equal to ( d + l)/2. Therefore, the pairwise error probability is given by 


Pi(d)= V ( d \ p k (\ - p) n ~ k (8.2-16) 

&=(d+l )/2 W 

If d is even, the incorrect path is selected when the number of errors exceeds \d. If the 
number of errors equals \d. there is a tie between the metrics in the two paths, which 
may be resolved by randomly selecting one of the paths; thus, an error occurs one-half 
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the time. Consequently, the pairwise error probability in this case is given by 

Pi{d) = l - ^ p d/2 ( 1 - p) d/ 2 + J2 Q P k(l ~ P^~ k (8-2-17) 
The etTor probability is bounded by 

OO 

P e < J2 a d p ^) (8.2-18) 

d=d{ Tllil 

where P 2 (d) is substituted from Equations 8.2-16 and 8.2-17, for odd and even values 
of d, respectively. 

A similar tighter bound for the bit error probability can also be derived by using 
the same approach. The result is given by 

2 OO 

Pb < 7 5Z Pd p . 2(d) (8.2-19) 

k d=d bee 

where fid are coefficients of Z d in the expansion of jyT(Y, Z) computed at Y = 1. 

A comparison of the error probability for the rate 1 /3, K = 3 convolutional code 
with soft decision decoding and hard decision decoding is made in Figure 8.2-1 . Note 
that the upper bound given by Equation 8.2-15 for hard decision decoding is less 
than 1 dB above the tighter upper bound given by Equation 8.2-19 in conjunction 
with Equations 8.2-16 and 8.2-17. The advantage of the Bhatacharyya bound is its 
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FIGURE 8.2-1 

Comparison of soft decision and hard decision 
decoding for^T =3 , k = 1, n = 3 convolutional 
code. 
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computational simplicity. In comparing the performance between soft decision and 
hard decision decoding, note that the difference obtained from the upper bounds is 
approximately 2.5 dB for 1 0 ~ 6 < P b < 10 2 . 

Finally, we should mention that the ensemble average error rate performance of 
a convolutional code on a discrete memoryless channel, just as in the case of a block 
code, can be expressed in terms of the cutoff rate parameter Rq as (for the derivation, 
see Viterbi and Omura (1979)) 


Pb < 


0 q - 1 )q- KR °/ R < 

(1 _ q-(R 0 -Rc)/Rcy ’ 


R, < R<) 


( 8 . 2 - 20 ) 


where q is the number of channel input symbols, K is the constraint length of the code, 
R c is the code rate, and Rq is the cutoff rate defined in Chapter 6. Therefore, conclusions 
reached by computing Rq for various channel conditions apply to both block codes and 
convolutional codes. 


■ 8.3 

DISTANCE PROPERTIES OF BINARY CONVOLUTIONAL CODES 


In this subsection, we shall tabulate the minimum free distance and the generators for 
several binary, short-constraint-length convolutional codes for several code rates. These 
binary codes are optimal in the sense that, for a given rate and a given constraint length, 
they have the largest possible c/f ree . The generators and the corresponding values of 
df ree tabulated below have been obtained by Odenwalder (1970), Larsen (1973), Paaske 
(1974), and Daut et al. (1982) using computer search methods. 

Heller (1968) has derived a relatively simple upper bound on the minimum free 
distance of a rate 1 / n convolutional code. It is given by 


t/free < min 
/>! 


2 ‘~ 1 
2 l - 1 


(K+l 


1 )n 


(8.3-1) 


where denotes the largest integer contained in x. For purposes of comparison, this 
upper bound is also given in the tables for the rate 1 / n codes. For rate k/n convolutional 
codes, Daut et al. (1982) have given a modification to Heller’s bound. The values 
obtained from this upper bound for k/n are also tabulated. 

Tables 8.3-1 to 8.3-7 list the parameters of rate l/n convolutional codes for n = 
2, 3, . . . , 8. Tables 8.3-8 to 8.3-1 1 list the parameters of several rate k/n convolutional 
codes for k < 4 and n < 8. 


■ 8.4 

PUNCTURED CONVOLUTIONAL CODES 

In some practical applications, there is a need to employ high-rate convolutional codes, 
e.g., rates of (n — 1 )/n. As we have observed, the trellis for such high-rate codes has 
2" _1 branches that enter each state. Consequently, there are 2" _1 metric computations 
per state that must be performed in implementing the Viterbi algorithm and as many 
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■ TABLE 8.3-1 

Rate 1/2 Maximum Free Distance Codes 


Constraint 
Length K 

Generators in Octal 

d free 

Upper Bound 

on dfcee 

3 

5 

7 

5 

5 

4 

15 

17 

6 

6 

5 

23 

35 

7 

8 

6 

53 

75 

8 

8 

7 

133 

171 

10 

10 

8 

247 

371 

10 

11 

9 

561 

753 

12 

12 

10 

1,167 

1,545 

12 

13 

11 

2,335 

3,661 

14 

14 

12 

4,335 

5,723 

15 

15 

13 

10,533 

17,661 

16 

16 

14 

21,675 

27,123 

16 

17 


Sources: Odenwalder (1970) and Larsen (1973). 


comparisons of the updated metrics to select the best path at each state. Therefore, the 
implementation of the decoder of a high-rate code can be very complex. 

The computational complexity inherent in the implementation of the decoder of a 
high-rate convolutional code can be avoided by designing the high-rate code from a low- 
rate code in which some of the coded bits are deleted from transmission. The deletion of 
selected coded bits at the output of a convolutional encoder is called puncturing, as previ- 
ously discussed in Section 7.8-2. Thus, one can generate high-rate convolutional codes 
by puncturing rate 1 /n codes with the result that the decoder maintains the low com- 
plexity of the rate l/« code. We note, of course, that puncturing a code reduces the free 
distance of the rate 1 / n code by some amount that depends on the degree of puncturing. 

The puncturing process may be described as periodically deleting selected bits 
from the output of the encoder, thus creating a periodically time-varying trellis code. 


TABLE 8.3-2 

Rate 1/3 Maximum Free Distance Codes 


Constraint 
Length K 

Generators in Octal 

dfree 

Upper Bound 
on dfj-gg 

3 

5 

7 

7 

8 

8 

4 

13 

15 

17 

10 

10 

5 

25 

33 

37 

12 

12 

6 

47 

53 

75 

13 

13 

7 

133 

145 

175 

15 

15 

8 

225 

331 

367 

16 

16 

9 

557 

663 

711 

18 

18 

10 

1,117 

1,365 

1.633 

20 

20 

11 

2,353 

2,671 

3,175 

22 

22 

12 

4,767 

5,723 

6,265 

24 

24 

13 

10,533 

10,675 

17,661 

24 

24 

14 

21,645 

35,661 

37,133 

26 

26 


Sources: Odenwalder (1970) and Larsen (1973). 
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■ TABLE 8.3-3 

Rate 1/4 Maximum Free Distance Codes 


Constraint 
Length K 


Generators in Octal 


t/free 

Upper Bound 

On free 

3 

5 

7 

7 

7 

10 

10 

4 

13 

15 

15 

17 

13 

15 

5 

25 

27 

33 

37 

16 

16 

6 

53 

67 

71 

75 

18 

18 

7 

135 

135 

147 

163 

20 

20 

8 

235 

275 

313 

357 

22 

22 

9 

463 

535 

733 

745 

24 

24 

10 

1,117 

1,365 

1,633 

1,653 

27 

27 

11 

2,327 

2,353 

2,671 

3,175 

29 

29 

12 

4,767 

5,723 

6,265 

7,455 

32 

32 

13 

11,145 

12,477 

15,537 

16,727 

33 

33 

14 

21,113 

23,175 

35,527 

35,537 

36 

36 

Source: Larsen (1973). 






TABLE 8.3-4 






Rate 1/5 Maximum Free Distance Codes 



Constraint 






Upper Bound 

Length K 


Generators in Octal 


d free 

on d f ree 

3 

7 

7 

7 5 

5 

13 

13 

4 

17 

17 

13 15 

15 

16 

16 

5 

37 

27 

33 25 

35 

20 

20 

6 

75 

71 

73 65 

57 

22 

22 

7 

175 

131 

135 135 

147 

25 

25 

8 

257 

233 

323 271 

357 

28 

28 


Source: Daut et al. (1982). 


TABLE 8.3-5 

Rate 1/6 Maximum Free Distance Codes 


Constraint 
Length K 

Generators in Octal 

{/free 

Upper Bound 

On rff r(! e 

3 

7 

7 

7 

16 

16 


7 

5 

5 



4 

17 

17 

13 

20 

20 


13 

15 

15 



5 

37 

35 

27 

24 

24 


33 

25 

35 



6 

73 

75 

55 

27 

27 


65 

47 

57 



7 

173 

151 

135 

30 

30 


135 

163 

137 



8 

253 

375 

331 

34 

34 


235 

313 

357 




Source: Daut et al. (1982). 
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TABLE 8.3-6 

Rate 1/7 Maximum Free Distance Codes 


Constraint 






Upper Bound 

Length K 

Generators in Octal 


d free 

On d free 

3 

7 

7 

7 

7 

18 

18 


5 

5 

5 




4 

17 

17 

13 

13 

23 

23 


13 

15 

15 




5 

35 

27 

25 

27 

28 

28 


33 

35 

37 




6 

53 

75 

65 

75 

32 

32 


47 

67 

57 




7 

165 

145 

173 

135 

36 

36 


135 

147 

137 




8 

275 

253 

375 

331 

40 

40 


235 

313 

357 




Source: Daut et al. (1982). 


TABLE 8.3 

-7 






Rate 1/8 Maximum Free Distance Codes 



Constraint 






Upper Bound 

Length K 


Generators in Octal 


d free 

on 

3 

7 

7 

5 

5 

21 

21 


5 

7 

7 

7 



4 

17 

17 

13 

13 

26 

26 


13 

15 

15 

17 



5 

37 

33 

25 

25 

32 

32 


35 

33 

27 

37 



6 

57 

73 

51 

65 

36 

36 


75 

47 

67 

57 



7 

153 

111 

165 

173 

40 

40 


135 

135 

147 

137 



8 

275 

275 

253 

371 

45 

45 


331 

235 

313 

357 



Source: Daut et al. (1982). 


■ TABLE 8.3-8 

Rate 2/3 Maximum Free Distance Codes 

Constraint 
Length K 


Generators in Octal 


^free 

Upper Bound 

On (/free 

2 

17 

6 

15 

3 

4 

3 

27 

75 

72 

5 

6 

4 

236 

155 

337 

7 

7 


Source: Daut et al. (1982). 
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■ TABLE 8.3-9 

Rate k/5 Maximum Free Distance Codes 


Rate 

Constraint 
Length K 


Generators in Octal 


^free 

Upper Bound 

On d free 

2/5 

2 

17 

07 

11 

12 

04 

6 

6 


3 

27 

71 

52 

65 

57 

10 

10 


4 

247 

366 

171 

266 

373 

12 

12 

3/5 

2 

35 

23 

75 

61 

47 

5 

5 

4/5 

2 

237 

274 

156 

255 

337 

3 

4 


Source: Daut et al. (1982). 


■ TABLE 8.3-10 

Rate k/1 Maximum Free Distance Codes 


Rate 

Constraint 
Length K 


Generators in Octal 

d free 

Upper Bound 

on d ffgg 

2/7 

2 

05 

06 

12 

15 

9 

9 



15 

13 

17 





3 

33 

55 

72 

47 

14 

14 



25 

53 

75 





4 

312 

125 

247 

366 

18 

18 



171 

266 

373 




3/7 

2 

45 

21 

36 

62 

8 

8 



57 

43 

71 




4/7 

2 

130 

067 

237 

274 

6 

7 



156 

255 

337 





Source: Daut et al. (1982). 


■ TABLE 8.3-11 

Rate 3/4 and 3/8 Maximum Free Distance Codes 


Constraint Upper Bound 

Rate Length K Generators in Octal d( ree on d f ree 


3/4 

2 

13 

25 

61 

47 

3/8 

2 

15 

42 

23 

61 



51 

36 

75 

47 


Source: Daut et al. (1982). 


We begin with a rate 1 /n parent code and define a puncturing period P . corresponding 
to P input information bits to the encoder. Hence, in one period, the encoder outputs 11P 
coded bits. Associated with the nP encoded bits is a puncturing matrix P of the form 


P 11 

Pn ■ ■ 

■ P\P 

P2\ 

P22 ■ ■ 

pop 

Pn 1 

Pn2 ■ ■ 

■ Pn P 


(8.4-1) 
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where each column of P corresponds to the n possible output bits from the encoder for 
each input bit and each element of P is either 0 or 1 . When p,j = 1 , the corresponding 
output bit from the encoder is transmitted. When p tJ = 0, the corresponding output bit 
from the encoder is deleted. Thus, the code rate is determined by the period P and the 
number of bits deleted. 

If we delete N bits out of nP, the code rate is P/(nP — N ), where N may take 
any integer value in the range 0 to {n — \)P — 1. Hence, the achievable code rates are 

P 

R c = , M = 1,2,..., (n — l)P (8.4-2) 

P + M 

example 8.4-1. Let us construct a rate | code by puncturing the output of the rate 
i, K = 3 encoder shown in Figure 8.1-2. There are many choices for P and M 
in Equation 8.4-2 to achieve the desired rate. We may take the smallest value of P, 
namely, P = 3. Then out of every nP — 9 output bits, we delete N = 5 bits. Thus, 
we achieve a rate | punctured convolutional code. As the puncturing matrix, we may 
select P as 


P = 


'1 1 

1 0 

0 0 


r 

o 

o 


(8.4-3) 


Figure 8.4-1 illustrates the generation of the punctured code from the rate | parent 
code. The corresponding trellis for the punctured code is also shown in Figure 8.4-1. 

In the example given above, the puncturing matrix was selected arbitrarily. How- 
ever, some puncturing matrices are better than others in that the trellis paths have better 
Hamming distance properties. A computer search is usually employed to find good 
puncturing matrices. Generally, the high-rate punctured convolutional codes generated 
in this manner have a free distance that is either equal to or 1 bit less than the best same 
high-rate convolutional code obtained directly without puncturing. 

Yasuda et al. (1984), Hole (1988), Lee (1988), Haccoun and Begin (1989), and 
Begin et al. (1990) have investigated the construction and properties of small and large 
constraint length punctured convolutional codes generated from low-rate codes. In 
general, high-rate codes with good distance properties are obtained by puncturing rate 
2 maximum free distance codes. For example, in Table 8.4-1 we list the puncturing 
matrices for code rates of I* R, < | which are obtained by puncturing rate ^ codes 
with constraint lengths 3 < K < 9. The free distances of the punctured codes are 
also given in the table. Punctured convolutional codes for additional rates and larger 
constraint lengths may be found in the papers referred to above. 

The decoding of punctured convolutional codes is performed in the same manner 
as the decoding of the low-rate 1 /n parent code, using the trellis of the 1 / n code. The 
path metrics in the trellis for soft decision decoding are computed in the conventional 
way as described previously. When one or more bits in a branch are punctured, the 
corresponding branch metric increment is computed based on the nonpunctured bits; 
thus, the punctured bits do not contribute to the branch metrics. Error events in a 
punctured code are generally longer than error events in the low-rate I / n parent code. 
Consequently, the decoder must wait longer than five constraint lengths before making 
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FIGURE 8.4-1 

Generation of a rate 3/4 punctured code from a rate 1 /3 convolutional code. 


■ TABLE 8.4-1 

Puncturing Matrices for Code Rates of 2/3 < R c < 7/8 from Rate 1/2 Code 


K 

Rate 2/3 

Rate 3/4 

Rate 4/5 

Rate 5/6 

Rate 6/7 

Rate 7/8 

P 

d free 

P 

d free 

P 

dfree 

P 

dfree 

P 

dfree 

P 

dfree 

3 

10 

3 

101 

3 

1011 

2 

10111 

2 

101111 

2 

1011111 

2 


11 


no 


1100 


11000 


1 10000 


1100000 


4 

11 

4 

no 

4 

1011 

3 

10100 

3 

100011 

2 

1000010 

2 


10 


101 


1100 


non 


111100 


1111101 


5 

11 

4 

101 

3 

1010 

3 

10111 

3 

101010 

3 

1010011 

3 


10 


no 


1101 


11000 


110101 


1101100 


6 

10 

6 

100 

4 

1000 

4 

10000 

4 

110110 

3 

1011101 

3 


11 


in 


mi 


inn 


101001 


1100010 


7 

11 

6 

no 

5 

nil 

4 

non 

4 

111010 

3 

1111010 

3 


10 


101 


1000 


10101 


100101 


1000101 


8 

10 

7 

no 

6 

1010 

5 

11100 

4 

101001 

4 

1010100 

4 


11 


101 


1101 


10011 


110110 


1101011 


9 

11 

7 

in 

6 

1101 

5 

10110 

5 

110110 

4 

1101011 

4 


10 


100 


1010 


11001 


101001 


1010100 
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final decisions on the received bits. For soft decision decoding, the performance of 
the punctured codes is given by the error probability (upper bound) expression in 
Equation 8.2-15 for the bit error probability. 

An approach for the design of good punctured codes is to search and select punc- 
turing matrices that yield the maximum free distance. A somewhat better approach is 
to determine the weight spectrum [fi £ /} of the dominant terms of the punctured code 
and to calculate the corresponding bit error probability bound. The code corresponding 
to the puncturing matrix that results in the best error rate performance may then be 
selected as the best punctured code, provided that it is not catastrophic. In general, in 
determining the weight spectrum for a punctured code, it is necessary to search through 
a larger number of paths over longer lengths than the underlying low -rate 1 /n parent 
code. Weight spectra for several punctured codes are given in the papers by Haccoun 
and Begin (1989) and Begin et al. (1990). 


8.4-1 Rate- Compatible Punctured Convolutional Codes 

In the transmission of compressed digital speech signals and in some other applications, 
there is a need to transmit some groups of information bits with more redundancy than 
others. In other words, the different groups of information bits require unequal error 
protection to be provided in the transmission of the information sequence, where the 
more important bits are transmitted with more redundancy. Instead of using separate 
codes to encode the different groups of bits, it is desirable to use a single code that 
has variable redundancy. This can be accomplished by puncturing the same low-rate 
\/n convolutional code by different amounts as described by Hagenauer (1988). The 
puncturing matrices are selected to satisfy a rate compatibility criterion, where the 
basic requirement is that lower-rate codes (higher redundancy) transmit the same coded 
bits as all higher-rate codes plus additional bits. The resulting codes obtained from a 
single rate 1 /n convolutional code are called rate-compatible punctured convolutional 
(RCPC) codes. 

example 8.4-2. From the rate K = 4 maximum free distance convolutional code, 
let us construct an RCPC code. The RCPC codes for this example are taken from 
the paper of Hagenauer (1988), who selected P = 8 and generated codes of rates 
ranging from jj to |. The puncturing matrices are listed in Table 8.4-2. Note that the 
rate ( code has a puncturing matrix with all zeros in the third row. Hence all bits from 
the third branch of the rate f encoder are deleted. Higher code rates are obtained by 
deleting additional bits from the second branch of the rate | encoder. However, note 
that when a 1 appears in a puncturing matrix of a high-rate code, a 1 also appears in 
the same position for all lower-rate codes. 

In applying RCPC codes to systems that require unequal error protection of the 
information sequence, we may format the groups of bits into a frame structure, as 
suggested by Hagenauer et al. (1990) and illustrated in Figure 8.4-2, where, for example, 
three groups of bits of different lengths N\ . N 2 , and N-\ are arranged in order of their 
corresponding specified error protection probabilities p\ > p 2 > /Tv Each frame is 
terminated after the last group of information bits (A3) by K — 1 zeros, which result 
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TABLE 8.4-2 

Rate-Compatible Punctured Convolutional Codes 
Constructed from Rate 1/3, K = 4 Code with P = 8 
R c = P/(P + M),M = 1, 2, 4, 6, 8, 10, 12, 14 


Rate 


Puncturing Matrix P 


1 

3 

T 

1 

1 

1 

1 

1 

1 

f 

l 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

4 

'l 

1 

1 

1 

1 

1 

1 

f 


l 

1 

1 

1 

1 

1 

1 

1 

11 

l 

1 

1 

0 

1 

1 

1 

0 

2 

'l 

1 

1 

1 

1 

1 

1 

f 


l 

1 

1 

1 

1 

1 

1 

1 

5 

l 

0 

1 

0 

1 

0 

1 

0 

4 

'l 

1 

1 

1 

1 

1 

1 

f 


l 

1 

1 

1 

1 

1 

1 

1 

9 

l 

0 

0 

0 

1 

0 

0 

0 

i 

T 

1 

1 

1 

1 

1 

1 

f 

1 

1 

1 

1 

1 

1 

1 

1 

2 

_0 

0 

0 

0 

0 

0 

0 

0 _ 

4 

T 

1 

1 

1 

1 
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in overhead bits that are used for the purpose of terminating the trellis in the all-zero 
state. We then select an appropriate set of RCPC codes that satisfy the error protection 
requirements, i.e., the specified error probabilities { /)/;}. In our example, the group of bits 
will be encoded by the use of three puncturing matrices having period P corresponding 
to a set of RCPC codes generated from a rate l/n code. Thus, the bits requiring the least 


Pi < pi < Pi 


K- 1 
zeros 

n 3 

n 2 

JVi 



FIGURE 8.4-2 

Frame structure for transmitting data with unequal error protection. 
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protection are transmitted first, followed by the bits requiring the next-higher level of 
protection, up to the group of bits requiring the highest level of protection, followed by 
the all-zero terminating sequence. All rate transitions occur within the frame without 
compromising the designed error rate performance requirements. As in the encoding, 
the bits within a frame are decoded by a single Viterbi algorithm using the trellis of the 
rate 1 / n code and performing metric computations based on the appropriate puncturing 
matrix for each group of bits. 

It can be shown (see Problem 8.21) that the average effective code rate of this 
scheme is 


where J is the number of groups of bits in the frame, P is the period of the RCPC 
codes, and the second term in the denominator corresponds to the overhead code bits 
which are transmitted with the lowest code rate (highest redundancy). 


OTHER DECODING ALGORITHMS FOR CONVOLUTIONAL CODES 

The Viterbi algorithm described in Section 8.2-1 is the optimum decoding algorithm 
(in the sense of maximum-likelihood decoding of the entire sequence) for convolutional 
codes. However, it requires the computation of 2 kK metrics at each node of the trellis and 
the storage of 2 k(K ~ |J metrics and 2 k(K ~ 1 1 surviving sequences, each of which may be 
about 5k K bits long. The computational burden and the storage required to implement 
the Viterbi algorithm make it impractical for convolutional codes with large constraint 
length. 

Prior to the discovery of the optimum algorithm by Viterbi, a number of other 
algorithms had been proposed for decoding convolutional codes. The earliest was the 
sequential decoding algorithm originally proposed by Wozencraft (1957), further treated 
by Wozencraft and Reiffen (1961), and subsequently modified by Fano (1963). 

Sequential decoding algorithm The Fano sequential decoding algorithm searches 
for the most probable path through the tree or trellis by examining one path at a time . The 
increment added to the metric along each branch is proportional to the probability of the 
received signal for that branch, just as in Viterbi decoding, with the exception that an 
additional negative constant is added to each branch metric. The value of this constant 
is selected such that the metric for the correct path will increase on the average, while 
the metric for any incorrect path will decrease on the average. By comparing the metric 
of a candidate path with a moving (increasing) threshold, Fano’s algorithm detects and 
discards incorrect paths. 

To be more specific, let us consider a memoryless channel. The metric for the ith 
path through the tree or trellis from the first branch to branch B may be expressed as 



(8.4-4) 


E./=i Nj(P + Mj) + (K - 1 )(P + Mj) 


8.5 


B n 



(8.5-1) 
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where 

ifL = l«g 2 — 4 £ (8.5-2) 

PVjm) 

In Equation 8.5-2, r jm is the demodulator output sequence, p(r Jln \cf m ) denotes the 
PDF of rj m conditional on the code bit cf' n for the mth bit of the / til branch of the ith 
path, and K, is a positive constant. /C is selected as indicated above so that the incorrect 
paths will have a decreasing metric while the correct path will have an increasing metric 
on the average. Note that the term p{rj m ) in the denominator is independent of the code 
sequence, and, hence, may be subsumed in the constant factor. 

The metric given by Equation 8.5-2 is generally applicable for either hard- or 
soft-decision decoding. However, it can be considerably simplified when hard-decision 
decoding is employed. Specifically, if we have a BSC with transition (error) probability 
p, the metric for each received bit, consistent with the form in Equation 8.5-2 is given by 

to _ / lo &[2(l - P)\ - Rc (if ~r jm = cf m ) 

^ i m 1 1 9 p /. f ~ , (i) \ ^-5 ^ 

Uog 2 2 p-Rc (if r^Cjm) 

where fj m is the hard-decision output from the demodulator, cf m is the m th code bit in 
the jth branch of the ith path in the tree, and R c is the code rate. Note that this metric 
requires some (approximate) knowledge of the error probability. 

example 8.5-1. Suppose we have a rate R c = 1/3 binary convolutional code for 
transmitting information over a BSC with p = 0.1. By evaluating Equation 8.5-3 we 
find that 


0.52 (if F jm = c%) 

-2.65 (if Dm # cf m ) 


(8.5-4) 


To simplify the computations, the metric in Equation 8.5-4 may be normalized. It is 
well approximated as 


1 (if r jm =cf m ) 

— 5 (if fj m ^ c(m) 


(8.5-5) 


Since the code rate is 1/3, there are three output bits from the encoder for each input 
bit. Hence, the branch metric consistent with Equation 8.5-5 is 


= 3 — 6 d 

or, equivalently, 

Ilf = 1-2 d (8.5-6) 

where d is the Hamming distance of the three received bits from the three branch bits. 
Thus, the metric fi { J 1 is simply related to the Hamming distance between received bits 
and the code bits in the / th branch of the ith path. 
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FIGURE 8.5-1 

An example of the path search in 
sequential decoding. [From Jordan 
(1996), © 1966 IEEE.] 


Initially, the decoder may be forced to start on the correct path by the transmission 
of a few known bits of data. Then it proceeds forward from node to node, taking the 
most probable (largest metric) branch at each node and increasing the threshold such 
that the threshold is never more than some preselected value, say x, below the metric. 
Now suppose that the additive noise (for soft-decision decoding) or demodulation errors 
resulting from noise on the channel (for hard-decision decoding) cause the decoder to 
take an incorrect path because it appears more probable than the correct path. This is 
illustrated in Figure 8.5-1 . Since the metrics of an incorrect path decrease on the average, 
the metric will fall below the current threshold, say xq. When this occurs, the decoder 
backs up and takes alternative paths through the tree or trellis, in order of decreasing 
branch metrics, in an attempt to find another path that exceeds the threshold to. If it is 
successful in finding an alternative path, it continues along that path, always selecting the 
most probable branch at each node. On the other hand, if no path exists that exceeds the 
threshold xq, the threshold is reduced by an amount r and the original path is retraced. 
If the original path does not stay above the new threshold, the decoder resumes its 
backward search for other paths. This procedure is repeated, with the threshold reduced 
by x for each repetition, until the decoder finds a path that remains above the adjusted 
threshold. A simplified flow diagram of Fano’s algorithm is shown in Figure 8.5-2. 

The sequential decoding algorithm requires a buffer memory in the decoder to 
store incoming demodulated data during periods when the decoder is searching for 
alternate paths. When a search terminates, the decoder must be capable of processing 
demodulated bits sufficiently fast to empty the buffer prior to commencing a new search. 
Occasionally, during extremely long searches, the buffer may overflow. This causes loss 
of data, a condition that can be remedied by retransmission of the lost information. In 
this regard, we should mention that the cutoff rate A), has special meaning in sequential 
decoding. It is the rate above which the average number of decoding operations per 
decoded digit becomes infinite, and it is termed the computational cutoff rate R c omp . In 
practice, sequential decoders usually operate at rates near Rq. 

The Fano sequential decoding algorithm has been successfully implemented in 
several communication systems. Its error rate performance is comparable to that of 
Viterbi decoding. However, in comparison with Viterbi decoding, sequential decoding 
has a significantly larger decoding delay. On the positive side, sequential decoding 
requires less storage than Viterbi decoding and, hence, it appears attractive for convo- 
lutional codes with a large constraint length. The issues of computational complexity 
and storage requirements for sequential decoding are interesting and have been thor- 
oughly investigated. For an analysis of these topics and other characteristics of the Fano 
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FIGURE 8.5-2 

A simplified flow diagram of Fano’s algorithm. [From Jordan (1966), © 1966 IEEE.) 

algorithm, the interested reader may refer to Gallager (1968), Wozencraft and Jacobs 
(1965), Savage (1966), and Forney (1974). 

Stack algorithm Another type of sequential decoding algorithm, called a stack al- 
gorithm, has been proposed independently by Jelinek (1969) and Zigangirov (1966). In 
contrast to the Viterbi algorithm, which keeps track of 2 <K ~ 1 ,k paths and corresponding 
metrics, the stack sequential decoding algorithm deals with fewer paths and their corre- 
sponding metrics. In a stack algorithm, the more probable paths are ordered according 
to their metrics, with the path at the top of the stack having the largest metric. At each 
step of the algorithm, only the path at the top of the stack is extended by one branch. 
This yields 2 k successors and their corresponding metrics. These 2 k successors along 
with the other paths are then reordered according to the values of the metrics, and all 
paths with metrics that fall below some preselected amount from the metric of the top 
path may be discarded. Then the process of extending the path with the largest metric 
is repeated. Figure 8.5-3 illustrates the first few steps in a stack algorithm. 

It is apparent that when none of the 2 k extensions of the path with the largest metric 
remains at the top of the stack, the next step in the search involves the extension of 
another path that has climbed to the top of the stack. It follows that the algorithm does not 
necessarily advance by one branch through the trellis in every iteration. Consequently, 
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An example of the stack algorithm 
for decoding a rate 1 /3 
convolutional code. 


Stack with accumulated path metrics 


Step a 

Step b 

Step c 

Step d 

Step e 

Step / 

-1 

-2 

-3 

-2 

-1 

-2 

-3 

-3 

-3 

-3 

-3 

-3 


-4 

-4 

-4 

-4 

-4 



-5 

-5 

-5 

-4 




-8 

-7 

-5 





-8 

-7 






-8 


some amount of storage must be provided for newly received signals and previously 
received signals in order to allow the algorithm to extend the search along one of the 
shorter paths, when such a path reaches the top of the stack. 

In a comparison of the stack algorithm with the Viterbi algorithm, the stack algo- 
rithm requires fewer metric computations, but this computational saving is offset to a 
large extent by the computations involved in reordering the stack after every iteration. 
In comparison with the Fano algorithm, the stack algorithm is computationally simpler, 
since there is no retracing over the same path as is done in the Fano algorithm. On the 
other hand, the stack algorithm requires more storage than the Fano algorithm. 

Feedback decoding A third alternative to the optimum Viterbi decoder is a method 
called feedback decoding (Heller, 1975), which has been applied to decoding for a BSC 
(hard-decision decoding). In feedback decoding, the decoder makes a hard decision on 
the information bit at stage j based on metrics computed from stage j to stage j + m , 
where m is a preselected positive integer. Thus, the decision on the information bit is 
either 0 or 1 depending on whether the minimum Hamming distance path that begins at 
stage j and ends at stage j + m contains a 0 or 1 in the branch emanating from stage j. 
Once a decision is made on the information bit at stage j, only that part of the tree that 
stems from the bit selected at stage j is kept (half the paths emanating from node j) 
and the remaining part is discarded. This is the feedback feature of the decoder. 
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The next step is to extend the part of the tree that has survived to stage j + l+m and 
consider the paths from stage j + 1 to j + 1 +m in deciding on the bit at stage j + 1 . Thus, 
this procedure is repeated at every stage. The parameter m is simply the number of stages 
in the tree that the decoder looks ahead before making a hard decision. Since a large value 
of m results in a large amount of storage, it is desirable to select m as small as possible. 
On the other hand, m must be sufficiently large to avoid a severe degradation in perfor- 
mance. To balance these two conflicting requirements, m is usually selected in the range 
K < m < 2K, where K is the constraint length. Note that this decoding delay is signif- 
icantly smaller than the decoding delay in a Viterbi decoder, which is usually about 5 K. 

example 8.5-2. Let us consider the use of a feedback decoder for the rate 1 /3 convo- 
lutional code shown in Figure 8.1-2. Figure 8.5-4 illustrates the tree diagram and the 
operation of the feedback decoder for m = 2. That is, in decoding the bit at branch j, 
the decoder considers the paths at branches j, j + 1, and j + 2. Beginning with the 
first branch, the decoder computes eight metrics (Hamming distances) and decides that 
the bit for the first branch is 0 if the minimum distance path is contained in the upper 
part of the tree, and 1 if the minimum distance path is contained in the lower part of 
the tree. In this example, the received sequence for the first three branches is assumed 
to be 101 111110, so that the minimum distance path is in the upper part of the tree. 
Hence, the first output bit is 0. 

The next step is to extend the upper part of the tree (the part of the tree that has 
survived) by one branch, and to compute the eight metrics for branches 2, 3, and 4. For 
the assumed received sequence 111110011, the minimum-distance path is contained 
in the lower part of the section of the tree that survived from the first step. Hence, the 
second output bit is 1 . The third step is to extend this lower part of the tree and to repeat 
the procedure described for the first two steps. 


000 


000 


111 


000 


111 


001 


110 


000 


001 

110 


Oil 

100 


000 

101 


001 


111 


100 


Received 


Step 1 


101 


110 


111 


010 


101 


110 


Oil 


sequence 

Step 1: Upper-tree metrics: 7, 6, 5, 2*; lower-tree metrics: 5, 4, 3, 4 — > 0 
Step 2: Upper-tree metrics: 7, 6,5, 6; lower-tree metrics: 3,6,1*, 2 — > 1 


FIGURE 8.5-4 

An example of feedback decoding for a 
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Instead of computing metrics as described above, a feedback decoder for the BSC 
may be efficiently implemented by computing the syndrome from the received sequence 
and using a table lookup method for correcting errors. This method is similar to the 
one described for decoding block codes. For some convolutional codes, the feedback 
decoder simplifies to a form called a majority logic decoder or a threshold decoder 
(Massey (1963); Heller (1975)). 

Soft-output algorithms The outputs of the Viterbi algorithm and the three algo- 
rithms described in this section are hard decisions. In some cases, it is desirable to have 
soft outputs from the decoder. This is the case if the decoding is being performed on an 
inner code in a concatenated code, where it is desirable to provide soft decisions to the 
input of the outer decoder. This is also the case in iterative decoding of concatenated 
codes, previously discussed in the context of block codes in Section 7.13-2, and further 
treated in the context of convolutional codes in Section 8.9-2. 

The optimum metric that provides a measure of the reliability of symbol decisions 
is the a posteriori probability of the detected symbol conditioned on the received signal 
vector r = {rj m ,m = 1,2, ,n; j = 1, 2, B}, where {r jm } is the sequence of soft 

outputs from the demodulator, n is the number of output symbols from the encoder for 
each k input symbols, and j is the branch index. For example, the output of the demodu- 
lator for a binary convolutional code and binary PSK modulation in an AWGN channel is 


where {cj m = 0, 1} are the output bits from the encoder. Given the received vector r, 
decisions on the transmitted information bits are based on the maximum a posteriori 
probability (MAP), which may be expressed as 


where x, denotes the / th information bit in the sequence. Thus, under the MAP criterion, 
a decision is made on a symbol-by-symbol basis by selecting the information symbol, 
or bit in this case, corresponding to the largest a posteriori probability. If the a posteriori 
probabilities for the possible transmitted symbols are nearly the same, the decision is 
unreliable. Hence, the a posteriori probability associated with the decided symbol (the 
hard decision) is the soft output from the decoder that provides a measure, or metric, for 
the reliability of the hard decision. Since the MAP criterion minimizes the probability 
of a symbol error, the a posteriori probability metric is the optimum soft output of the 
decoder. 

An algorithm for recursively computing the a posteriori probabilities for each 
received symbol given the received signal sequence r from the demodulator has been 
described in the paper by Bahl, Cocke, Jelinek, and Raviv (1974). This symbol-by- 
symbol decoding algorithm, called the BCJR algorithm, is based on the MAP criterion 
and provides a hard decision on each received symbol and the a posteriori probability 
metric that serves as a measure for the reliability of the hard decision. The BCJR 
algorithm is described in Section 8.8. 

In contrast to the MAP symbol-by-symbol detection criterion, the Viterbi algorithm 
selects the sequence that maximizes the probability p(r |jc), where x is the vector of 
information bits. In this case, the soft output metric is the Euclidean distance associated 



(8.5-7) 


P(jc, = 0|r) = l-P( Xi = l|r) 


(8.5-8) 
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with the sequence of received symbols, as opposed to the individual symbols. However, 
it is possible to derive symbol metrics from the sequence or path metrics. Hagenauer 
and Hoeher (1989) devised a soft-output Viterbi algorithm (SOVA) that provides a 
reliability metric for each decoded symbol. The SOVA is based on the observation 
that the probability that a hard decision on a given symbol at the output of the Viterbi 
algorithm is correct is proportional to the difference in path metrics between a surviving 
sequence and its associated nonsurviving sequences. This observation allows us to form 
an estimate of the error probability, or the probability of a correct decision, for each 
symbol by comparing the path metrics of the surviving path with the path metrics of 
nonsurviving paths. 

For example, let us consider a binary convolutional code with binary PSK mod- 
ulation. Since the Viterbi algorithm makes decisions with a decoding delay 5, at time 
t = i + <5 the Viterbi decoder outputs the bit x is from the most probable surviving 
sequence. When we trace back along the surviving path from t to t — 8, we observe 
that we have discarded S + 1 paths. Let us consider the jth discarded path and its 
corresponding bit x (/ - at time t = /.If j i is ^ x,j . let ij/j ( i//, > 0) be equal to 
the difference in the path metrics between the surviving path and the jth discarded 
path. If x i S = Xjj, let \[fj = oo. This comparison is performed for all discarded 
paths. From the set (i Aj, j = 0, 1, 2, ■ ■ • , 5} we select the smallest value, defined 
as 'Amin = ni i n { i// () , 1 A 1 , • • • , 'As } • Then, the probability of error for the bit x is is approx- 
imated as 


P, = 


1 -j- e fm 


(8.5-9) 


Note that if iA m in is very small, P e ~ so the decision on x is is unreliable. Thus, P e 
provides a reliability metric for the hard decisions at the output of the Viterbi algorithm. 
We note, however, that P e is only an approximation to the true error probability. That 
is, P e is not the optimum soft-output metric for the hard decisions at the output of 
the Viterbi algorithm. In fact, it has been observed in a paper by Wang and Wicker 
(1996) that P e underestimates the true error probability at low SNR. Nevertheless, this 
soft-output metric from the Viterbi algorithm leads to a significant improvement in the 
performance of the decoder in a concatenated code. 

From Equation 8.5-9 we can obtain an estimate of the probability of a correct 
decision as 


•rfm 


P-l-P- 


1 + g V^min 


(8.5-10) 


■ 8.6 

PRACTICAL CONSIDERATIONS IN THE APPLICATION 
OF CONVOLUTIONAL CODES 

Convolutional codes are widely used in many practical applications of communication 
system design. Viterbi decoding is predominantly used for short constraint lengths 
(K < 10), while sequential decoding is used for long-constraint-length codes, where 
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TABLE 8.6-1 

Upper Bounds on Coding Gain for Soft-Decision Decoding of Some 
Convolutional Codes 


Rate 1/2 codes Rate 1/3 codes 


Constraint 
Length K 

d free 

Upper bound, 
dB 

Constraint 
Length K 

d free 

Upper bound, 
dB 

3 

5 

3.98 

3 

8 

4.26 

4 

6 

4.77 

4 

10 

5.23 

5 

7 

5.44 

5 

12 

6.02 

6 

8 

6.02 

6 

13 

6.37 

7 

10 

6.99 

7 

15 

6.99 

8 

10 

6.99 

8 

16 

7.27 

9 

12 

7.78 

9 

18 

7.78 

10 

12 

7.78 

10 

20 

8.24 


the complexity of Viterbi decoding becomes prohibitive. The choice of constraint length 
is dictated by the desired coding gain. 

From the error probability results for soft-decision decoding given by Equa- 
tions 8.2-11, 8.2-12, and 8.2-13, it is apparent that the coding gain achieved by a 
convolutional code over an uncoded binary PSK or QPSK system is 

Coding gain < 10 log l(l f // r t/ lree ) 

We also know that the minimum free distance df ree can be increased either by decreasing 
the code rate or by increasing the constraint length, or both. Table 8.6-1 provides a 
list of upper bounds on the coding gain for several convolutional codes. For purposes 
of comparison, Table 8.6-2 lists the actual coding gains for several short-constraint- 
length convolutional codes with Viterbi decoding. It should be noted that the coding 
gain increases toward the asymptotic limit as the SNR per bit increases. 

These results are based on soft-decision Viterbi decoding. If hard-decision decoding 
is used, the coding gains are reduced by approximately 2 dB for the AWGN channel. 

Larger coding gains than those listed in Tables 8.6-1 and 8.6-2 are achieved by 
employing long-constraint-length convolutional codes, e.g., K = 50, and decoding 
such codes by sequential decoding. Invariably, sequential decoders are implemented 


■ TABLE 8.6-2 

Coding Gain (dB) for Soft-Decision Viterbi Decoding 


P b 

£b/N 0 

Uncoded, 

dB 

Rc 

= 1/3 


R c = 1/2 


Rc 

= 2/3 

R c = 

3/4 

K =8 

K = 8 

K =5 

K = 6 

K = 1 

K = 6 

K = 8 

K =6 

K = 9 

10 -3 

6.8 

4.2 

4.4 

3.3 

3.5 

3.8 

2.9 

3.1 

2.6 

2.6 

10 -5 

9.6 

5.7 

5.9 

4.3 

4.6 

5.1 

4.2 

4.6 

3.6 

4.2 

10 -7 

11.3 

6.2 

6.5 

4.9 

5.3 

5.8 

4.7 

5.2 

3.9 

4.8 


Source: Jacobs (1974); (c) IEEE. 
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FIGURE 8.6-1 

Performance of rate 1 /2 and rate 1 /3 
Viterbi and sequential decoding. [From 
Omura and Levitt (1982). © 1982 IEEE.] 


for hard-decision decoding to reduce complexity. Figure 8.6-1 illustrates the error rate 
performance of several constraint-length K =7 convolutional codes for rates 1/2 and 
1 /3 and for sequential decoding (with hard decisions) of a rate 1 /2 and a rate 1 /3 
constraint-length K = 41 convolutional codes. Note that the K = 41 codes achieve an 
error rate of 1 0 at 2.5 and 3 dB, which are within 4—4.5 dB of the channel capacity 
limit, i.e., in the vicinity of the cutoff rate limit. However, the rate 1 /2 and rate 1 /3, 
K = 7 codes with soft-decision Viterbi decoding operate at about 5 and 4.4 dB at 10 6 , 
respectively. These short-constraint-length codes achieve a coding gain of about 6 dB 
at 1 0 6 , while the long-constraint-length codes gain about 7.5-8 dB. 

Two important issues in the implementation of Viterbi decoding are 

1. The effect of path memory truncation, which is a desirable feature that ensures a 
fixed decoding delay. 

2. The degree of quantization of the input signal to the Viterbi decoder. 

As a rule of thumb, we stated that path memory truncation to about five constraint 
lengths has been found to result in negligible performance loss. Figure 8.6-2 illustrates 
the performance obtained by simulation for rate 1/2, constraint-lengths K = 3, 5, and 
7 codes with memory path length of 32 bits. In addition to path memory truncation, 
the computations were performed with eight-level (three bits) quantized input signals 
from the demodulator. The broken curves are performance results obtained from the 
upper bound in the bit error rate given by Equation 8.2-12. Note that the simulation 
results are close to the theoretical upper bounds, which indicate that the degradation 
due to path memory truncation and quantization of the input signal has a minor effect 
on performance (0.20-0.30 dB). 

Figure 8.6-3 illustrates the bit error rate performance obtained via simulation for 
hard-decision decoding of convolutional codes with K = 3-8. Note that with the K = 8 
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FIGURE 8.6-2 

Bit error probability for rate 1 /2 Viterbi decoding 
with eight-level quantized inputs to the decoder and 
32-bit path memory. [From Heller and Jacobs (1971). 
© 1971 IEEE.] 


code, an error rate of 1 0 5 requires about 6 dB, which represents a coding gain of nearly 
4 dB relative to uncoded QPSK. 

The effect of input signal quantization is further illustrated in Figure 8.6-4 for a rate 
1/2 , K = 5 code. Note that 3-bit quantization (eight levels) is about 2 dB better than 
hard-decision decoding, which is the ultimate limit between soft-decision decoding 
and hard-decision decoding on the AWGN channel. The combined effect of signal 
quantization and path memory truncation for the rate 1/2 , K = 5 code with 8-, 16-, 
and 32-bit path memories and either 1- or 3-bit quantization is shown in Figure 8.6-5. 
It is apparent from these results that a path memory as short as three constraint lengths 
does not seriously degrade performance. 

When the signal from the demodulator is quantized to more than two levels, an- 
other problem that must be considered is the spacing between quantization levels. 
Figure 8.6-6 illustrates the simulation results for an eight-level uniform quantizer as 
a function of the quantizer threshold spacing. We observe that there is an optimum 



FIGURE 8.6-3 

Performance of rate 1 /2 codes with hard-decision 
Viterbi decoding and 32-bit path memory truncation. 
[From Heller and Jacobs (1971 ). © 1971 IEEE.] 
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FIGURE 8.6-4 

Performance of rate 1/2, K = 5 code with eight-, four-, 
and two-level quantization at the input to the Viterbi 
decoder. Path truncation length = 32 bits. [From Heller 
and Jacobs (1971). © 1971 IEEE.] 



SNR per bit, y b (dB) 


FIGURE 8.6-5 

Performance of rate 1/2, K =5 code with 32-, 16-, 
and 8-bit path memory truncation and eight- and 
two-level quantization. [From Heller and Jacobs 
(1971). © 1971 IEEE.] 



Quantizer threshold spacing 


FIGURE 8.6-6 

Error rate performance of rate 1/2, K = 5 Viterbi decoder 
for Eb/Na = 3.5 dB and eight-level quantization as a 
function of quantizer threshold level spacing for equally 
spaced thresholds. [From Heller and Jacobs (1971 ). © 
1971 IEEE.] 


spacing between thresholds (approximately equal to 0.5). However, the optimum is 
sufficiently broad (0.4— 0.7), so that, once it is set, there is little degradation resulting 
from variations in the AGC level of the order of ±20 percent. 

Finally, we should point out some important results in the performance degradation 
due to carrier phase variations. Figure 8.6-7 illustrates the performance of a rate 1/2, 
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FIGURE 8.6-7 

Performance of a rate 1/2, K = 7 code with 
Viterbi decoding and eight-level quantization 
as a function of the carrier phase tracking loop 
SNR Yl [From Heller and Jacobs (1971 ). 

© 1971 IEEE.] 


K = 7 code with eight-level quantization and a carrier phase tracking loop SNR y L . 
Recall that in a PLL, the phase error has a variance that is inversely proportional to y L . 
The results in Figure 8.6-7 indicate that the degradation is large when the loop SNR is 
small (yl < 12 dB), and causes the error rate performance to bottom out at a relatively 
high error rate. 


■ 8.7 

NONBINARY DUAL-* CODES AND CONCATENATED CODES 

Our treatment of convolutional codes thus far has been focused primarily on binary 
codes. Binary codes are particularly suitable for channels in which binary or quaternary 
PSK modulation and coherent demodulation is possible. However, there are many 
applications in which PSK modulation and coherent demodulation is not suitable or 
possible. In such cases, other modulation techniques, e.g., M - ary FSK, are employed in 
conjunction with noncoherent demodulation. Nonbinary codes are particularly matched 
to M - ary signals that are demodulated noncoherently. 

In this subsection, we describe a class of nonbinary convolutional codes, called 
dual-k codes, that are easily decoded by means of the Viterbi algorithm using either 
soft-decision or hard-decision decoding. They are also suitable either as an outer code 
or as an inner code in a concatenated code, as will also be described below. 

A dual-* rate 1/2 convolutional encoder may be represented as shown in 
Figure 8.7-1. It consists of two ( K = 2) *-bit shift-register stages and n = 2k func- 
tion generators. Its output is two *-bit symbols. We note that the code considered in 
Example 8. 1 — 4 is a dual-2 convolutional code. 
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FIGURE 8.7-1 

Encoder for rate 1 /2 dual-/: codes. 


The 2k function generators for the dual-/: codes have been given by Viterbi and 
Jacobs (1975). These may be expressed in the form 
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(8.7-1) 


where 1^ denotes the k x k identity matrix. 

The general form for the transfer function of a rate 1 /2 dual-/: code has been derived 
by Odenwalder (1976). It is expressed as 


T(Y, Z, J) = 


( 2 k - i)z 4 y 2 y 
l -yy[2z + (2 fc -3)z 2 ] 


i = 4 


(8.7-2) 


where D represents the Hamming distance for the :/-ary (q = 2 A ) symbols, the f(i) 
exponent on N represents the number of information symbol errors that are produced 
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in selecting a branch in the tree or trellis other than a corresponding branch on the 
all-zero path, and the h(i) exponent on 7 is equal to the number of branches in a given 
path. Note that the minimum free distance is <7f ree = 4 symbols (4 k bits). 

Lower-rate dual-/: convolutional codes can be generated in a number of ways, the 
simplest of which is to repeat each symbol generated by the rate 1 /2 code r times, 
where r = 1,2 , ,m (r = 1 corresponds to each symbol appearing once). If each 
symbol in any particular branch of the tree or trellis or state diagram is repeated r times, 
the effect is to increase the distance parameter from Z to Z'. Consequently the transfer 
function for a rate 1 /2 r dual-/: code is 


T(Y, Z, 7 ) = 


(2 k - i )z 4 ' v 2 y 

1 - YJ[2Z r + (2 k - 3 )Z 2r ] 


(8.7-3) 


In the transmission of long information sequences, the path length parameter 7 
in the transfer function may be suppressed by setting 7=1. The resulting transfer 
function T(Y, Z) may be differentiated with respect to Y, and Y is set to unity. This 
yields 


dT(Y, Z) 
dY 


N= 1 


(2 k - 1)Z 4 ' 

[1 - 2Z r - (2* - 3)Z 2r ] 2 

oo 

Ea- z ' 

i=4r 


(8.7-4) 


where /?, represents the number of symbol errors associated with a path having distance 
Z' from the all-zero path, as described previously in Section 8.2-2. The expression in 
Equation 8.7-4 may be used to evaluate the error probability for dual-/: codes under 
various channel conditions. 


Performance ofdual-k codes with M-ary modulation Suppose that a dual-/: code 
is used in conjunction with M -ary orthogonal signaling at the modulator, where M = 
2 k . Each symbol from the encoder is mapped into one of the M possible orthogonal 
waveforms. The channel is assumed to add white Gaussian noise. The demodulator 
consists of M matched filters. 

If the decoder performs hard-decision decoding, the performance of the code is 
determined by the symbol error probability P e . This error probability has been computed 
in Chapter 4 for both coherent and noncoherent detection. From P e , we can determine 
P 2 (d) according to Equation 8.2-16 or 8.2-17, which is the probability of error in a 
pairwise comparison of the all-zero path with a path that differs in d symbols. The 
probability of a bit error is upper-bounded as 

2*— i 00 

p b < E Pd p i(d) (8.7-5) 

1 d=4r 

The factor 2 k l /(2 k — 1) is used to convert the symbol error probability to the bit error 
probability. 

Instead of hard-decision decoding, suppose that the decoder performs soft-decision 
decoding using the output of a demodulator that employs a square-law detector. The 
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expression for the bit error probability given by Equation 8.7-5 still applies, but now 
P 2 {d) is given by (see Section 11.1-1) 


where 


Pi(d) = 


1 


exp 


-- Yb R c d)J2Ki 


d- 1 


i = 0 




(8.7-6) 


(8.7-7) 


and R c = 1/2 r is the code rate. 

Concatenated codes In Section 7.13-2, we considered the concatenation of two 
block codes to form a long block code. Now that we have described convolutional 
codes, we broaden our viewpoint and consider the concatenation of a block code with 
a convolutional code or the concatenation of two convolutional codes. 

In a conventional concatenated code, the outer code is usually chosen to be non- 
binary, with each symbol selected from an alphabet of q = 2 k symbols. This code 
may be a block code, such as a Reed-Solomon code, or a convolutional code, such as 
a dual-/: code. The inner code may be either binary or nonbinary, and either a block 
or a convolutional code. For example, a Reed-Solomon code may be selected as the 
outer code and a dual-k code may be selected as the inner code. In such a concatenation 
scheme, the number of symbols in the outer (Reed-Solomon) code q equals 2 k , so that 
each symbol of the outer code maps into a Ar-bit symbol of the inner dual -A: code. Af-ary 
orthogonal signals may be used to transmit the symbols. 

The decoding of such concatenated codes may also take a variety of different 
forms. If the inner code is a convolutional code having a short constraint length, the 
Viterbi algorithm provides an efficient means for decoding, using either soft-decision 
or hard-decision decoding. 

If the inner code is a block code, and the decoder for this code performs soft- 
decision decoding, the outer decoder may also perform soft-decision decoding using 
as inputs the metrics corresponding to each word of the inner code. On the other hand, 
the inner decoder may make a hard decision after receipt of the code word and feed the 
hard decisions to the outer decoder. Then the outer decoder must perform hard-decision 
decoding. 

The following example describes a concatenated code in which the outer code is a 
convolutional code and the inner code is a block code. 

example 8.7-1. Suppose we construct a concatenated code by selecting a dual -A: code 
as the outer code and a Hadamard code as the inner code. To be specific, we select a 
rate 1 /2 dual-5 code and a Hadamard (16, 5) inner code. The dual-5 rate 1 /2 code has 
a minimum free distance Z)f ree = 4 and the Hadamard code has a minimum distance 
d m i n = 8. Hence, the concatenated code has an effective minimum distance of 32. Since 
there are 32 code words in the Hadamard code and 32 possible symbols in the outer 
code, in effect, each symbol from the outer code is mapped into one of the 32 Hadamard 
code words. 
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The probability of a symbol error in decoding the inner code may be determined 
from the results of the performance of block codes given in Sections 7.4 and 7.5 
for soft-decision and hard-decision decoding, respectively. First, suppose that hard- 
decision decoding is performed in the inner decoder with the probability of a code word 
(symbol of outer code) error denoted as P 32 , since M = 32. Then the performance of 
the outer code and, hence, the performance of the concatenated code is obtained by 
using this error probability in conjunction with the transfer function for the dual-5 code 
given by Equation 8.7-2. 

On the other hand, if soft-decision decoding is used on both the outer and the inner 
codes, the soft-decision metric from each received Hadamard code word is passed to 
the Viterbi algorithm, which computes the accumulated metrics for the competing paths 
through the trellis. We shall give numerical results on the performance of concatenated 
codes of this type in our discussion of coding for Rayleigh fading channels. 


■ 8.8 

MAXIMUM A POSTERIORI DECODING OF CONVOLUTIONAL 
CODES —THE BCJR ALGORITHM 

The BCJR algorithm, named after Bahl, Cocke, Jelinek, and Raviv Bahl et al. (1974), 
is a symbol-by-symbol maximum a posteriori decoding algorithm for convolutional 
codes. In this algorithm the decoder uses the MAP algorithm to decode each input 
symbol to the decoder rather than looking for the most likely input sequence. 

We know that convolutional codes are finite memory encoders in which the output 
and the next state depend on the current state and the input. Assuming k = 1, we denote 
an information sequence of length N by u = (u \ , uo, . . . , u ; y ) where u, e {0, 1}, and 
the corresponding encoded sequence by' c = (c\, C 2 , ■ ■ ■ , c ,v ) where the length of c, 
is n. The encoder state at time i is denoted by cr,-. For 1 < i < N we have 

c, = / c (n,, tr,_i) ( 8 . 8 - 1 ) 

°i = fs(ui,(Ti- 1) ( 8 . 8 - 2 ) 

where functions f c and f s define the codeword and the new state as functions of the 
inputw, e {0, 1} and the previous state ct,_i e E, where Edenotesthesetofallstates.lt 
is clear that any pair of states (<r,_i, cr, ) that satisfies Equation 8.8-2 corresponds either 
to Ui = 1 or to Uj = 0. Therefore, we can partition the set of all pairs of state (cr,_i, cr,-) 
which correspond to all possible transitions into two subsets .S’o and .S) , corresponding 
to Ui = 0 and n, = 1 , respectively. 

The symbol-by-symbol maximum a posteriori decoding receives y = (yi, y 2 , ■ ■ ■ , 
y N ), the demodulator output, and based on this observation decodes m,- using the 


tWe use c to denote both the encoded sequence, which is a binary sequence of length 11 N with elements 
from {0, 1}, and the encoded sequence after BPSK modulation, which is a sequence of length nN with 
elements from ±^£7. It should be clear from the context which notion is used. 
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maximum a posteriori rule 


Uj = argmax P(uj\y) 
n,e {0.1| 


= arg max 


P(u t , y ) 


H,e{0.1| p(y ) 

= argmax p(w,-, y) 

H,e{0.1| 


(8.8-3) 


= arg max V p{a i _ i , or,- , y) 

<e{0,1} <0i-i,cn)eSi 

where the last equality follows from the fact that u, = I corresponds to all pairs of state 
(cr,_ i, a ,-) e Se for £ = 0, 1. 

If we define 


we can write 


Ji+i = (ji+1, ...,y N ) 

y = (y ( r\y i ,y ( , N + \) 


(8.8-4) 


(8.8-5) 


and we have 

P(°i- 1 . <*u y) = P (o-r-i, °i, Ji- Ji+l) 


= P (cr/-m ^i' l \yi)p( 


(N) , v- 

y^+Wai-uVi, y\ 


0 - 1 ) 


■*) 


= P (<*«- i - Ji ‘ ’) T 7 , 37 |oi-i . /i ‘ ’) 7 7 (ji+i ki-t . - J i 3h) 

= 7 7 (cTf -1 » 3’i 1_1) ) piPi, yi\°i-i)p (jl+iki) 

( 8 . 8 - 6 ) 

where the first three steps follow from the chain rule and the last step follows from 
Markov properties of the state in a trellis. 


At this point we define a,-_i (or,-!), /3, (or,), and y, (cr,_i, or,-) as 
off— i (oi_t) = p (a h 


„('-!) 
-1> Jl 


AO;) = t 7 (j’!+iki) 

Yi Oi-i , or,- ) — p (or,- , y, |ct,-_i) 

Using these definitions in Equation 8.8-6, we have 

P (°i— l > or,-, y) = a,--! (o-i-O y, (or,--!, or,-) $ (a,-) 

and hence from Equation 8.8-3 we obtain 

Ui = arg max V Q!,-_i (tr,-_i) y, (cr,-_! , ct,-) /?,■ (or,-) 
fe{0J1 (o- I -_ 1 ,o- l -)eS < 


(8.8-7) 

( 8 . 8 - 8 ) 

(8.8-9) 
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Equation 8.8-9 indicates that for maximum a posteriori decoding we need the values 
of a,_i (ct, _ | ), ^ (ct,), and y,- (cr,-i, ct,). It should also be clear that although our devel- 
opment of these equations was based on the assumption of k = 1 and n, e {0, 1}, the 
extension of these results to general k is straightforward. 

Now we derive recursion relations for a-, \ (ct,-_ i) and Pi (cr, ) which facilitate their 
computation. 

The Forward Recursion for a,- (a,-) We show that cy, _ | (cr ; „i) can be obtained by 
using a. forward recursion of the form 

Oli (or,) = Y Yi (OT/- 1 , or,) O',-! (CT,-,) , 1 < i < N 

cr, ieX 

To prove Equation 8.8-10, we use the following set of relations 

at (or, ) = p (ct,-, y i' 1 ) 

= Y P 

o-j-ieS 

= 55 P J’l” 1 ') P (°i> Jilo'i-l. Tl'" 0 ) 

o-j-ieS 

= ^ Oti-l (or,-!) y,- (cr,-!, <T,-) 

cr, ieX 


( 8 . 8 - 10 ) 


(8.8-11) 


which completes the proof of the forward recursion relation for a, (cr, ). This rela- 
tion means that given the values of y, (cr,_ i , cr, ), it is possible to obtain a, (cr, ) from 

a,_i (cr,- ! ). If we assume that the trellis starts in the all-zero state, the initial condition 

for the forward recursion becomes 


a 0 (oo) = P (ct 0 ) = < * CT ° , n (8.8-12) 

[^0 CTO ^ 0 

Equations 8.8-10 and 8.8-12 provide a complete set of recursions for computing the 
values of a. 

The Backward Recursion for Pi (cr,-) The backward recursion for computing the 
values of p is given by 

Pi - 1 (ct,- — i) = Y Pi Vi °i) , 

cr/eE 


1 < i < N 


(8.8-13) 
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To prove this recursion, we note that 


A- 1 (oi-i) = p K-i) 

= J2 p (yi>y\+u a i 


ffjEl 



(8.8-14) 


OT/€S 


= J2p ( ff ;> j; ki-i) p (ji+i k- ) 


= X] ^ °>) Pi ( ff i) 


a,-eS 


The boundary condition for the backward recursion, assuming that the trellis is 
terminated in the all-zero state, is 


The recursive relations 8.8-10 and 8.8-13 together with initial conditions 8.8-12 
and 8.8-15 provide the necessary equations to determine at’s and /3’s when y ’s are 
known. We now focus on computation of y’ s. 

Computing y,- (<r,_i, <r,) We can write y, (cr, - 1 , cr,), 1 < z < N, as 


where we have used the fact that there exists a one-to-one correspondence between a 
pair of states , cr, ) and the input m, through Equation 8.8-2. The above expression 
clearly shows the dependence of y,- (er,_i, cr, ) on P(uj). the prior probability of the 
information sequence at time i, as well as p (y, |c,) which depends on the channel char- 
acteristics. If the information sequence is equiprobable, an assumption that is usually 
made when no information is available, then P(w, = 0) = Pin, = I ) = ! . Obviously, 
the above derivation is based on the assumption that the state pair rr, ) is a valid 
pair; i.e., a transition from er,_i to cr, is possible. 

Equation 8.8-9 together with the forward and backward relations for a and j J > given 
in Equations 8.8-10 and 8.8-13 and Equation 8.8-16 for y are known as the BCJR 
algorithm for symbol-by-symbol MAP decoding of a convolutional code. 

Note that unlike the Viterbi algorithm that looks for the most likely information 
sequence, the BCJR finds the most likely individual bits, or symbols. The BCJR al- 
gorithm also provides the values of P (u, y). These values provide a level of certainty 
of the decoder about the value of m, and are called soft outputs or soft values. Having 



a N = 0 
a N 7^ 0 


(8.8-15) 


Yi (cr;— t, o',) = p (cr,, y, |cr,_i) 

= p (cr,- lor,-— 0 p (y / 1 cr,- , cr,-!) 


(8.8-16) 


= P(uj)p (yi\ui) 
= P(uj)p(yt\Ci) 
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P (u;\y), we can find the a posteriori L values as 


L(iij) = In 


P(u, = l|jO 

P( Ui =0\y) 


, P(u i = l,y) 

In 

P (m = 0 , y) 


Y. <Xi - 1 (0i - 1 ) Yi (cr,- _ 1 , (Ti ) Pi ( CT, ) 
(CT,_l,cr,)eSi 

y Oli-l ((Ti-i) Yi (Oi-u (Ti) Pi ((Ti ) 

(CTi_i,cr,)eSo 


(8.8-17) 


which are also referred to as soft outputs. Knowledge of soft outputs is crucial in 
decoding of turbo codes discussed later in this chapter. A decoder such as the BCJR 
decoder that accepts soft inputs (the vector y) and generates soft outputs is called a 
soft-input soft-output (SISO) decoder. Note that the decoding rule based on L(w,) soft 
values is given by 


L(uj) > 0 
L(uj) < 0 


(8.8-18) 


For an AWGN channel, y = c + n, where c represents the modulated signal 
corresponding to the encoded sequence, we have 


Yi 1, (Ti) 


P(Uj) 

(ttAq)"/ 2 



IIj, -g!I 2 \ 

No ) 


(8.8-19) 


example 8 . 8 - 1 . Let us consider the special case when n = 2, the convolutional code 
is systematic, and the modulation is BPSK. In this case we have c, = (erf cf ) and 
yi = (yf yf), where the superscripts s and p represent the terms corresponding to 
the systematic (information) bit and parity check bit, respectively. Here cf = ±V^e 
depending on whether «, = 1 or n, = 0. The value of cf can also be one of the two 
possible values of ± V /Jf. Using these values, Equation 8.8-19 becomes 


, , P(m) ( {yl - cf) + (yf -c?Y 

Yi ((Ti- 1, cr,') = — — exp -4 


ttNq 

1 

71 No 


exp • 


No 

U) 2 +(tQ 2 + 2£ c 

No 


P(ud exp 


2 yf cf + 2y\ cf 


N 0 


( 8 . 8 - 20 ) 


Note that the term — exp 

71 N o 1 


(r?) +(yf) +2£c 
N 0 


in Equation 8.8-20 is independent 


of Ui and hence is canceled from the numerator and the denominator of the a posteriori 
L values in Equation 8.8-17. It is also clear that in the numerator of Equation 8.8-17, 
which corresponds to u ,■ = 1, we have c- = ~/E~ c and in the denominator cf — 
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In this case the a posteriori L values simplify as 


^ «<-i (07-1) P(ui) exp 

L( Ui ) = In ( °(=^ )eSl 

^2 (cr,_ 1 ) /’(«,) exp 

(CT,_i,tr,)eSo 


2 ?M + 2 yfcf 
N 0 


2y°c° + 2 yfcj 
No 


Pi (o’/) 


Pi (07) 


No 


^ «/-i (o’,-:) exp 

yy «j_i (cr,_i) P(n,) exp 

(CT,— 1 , CT;)€So 


2y/V/ 

No 


2yfcf 
No 


Pi (or,) 


Pi (of) 


VAy? P(«,- = i) 

h In 

Wo = 0) 


+ In 


yy a,_i (o,_i) exp 

(<Ti_l,(T,)€Sl 

yy a,_i (07-1) exp 

(O',_l,0 r ,)€5o 



2 yfcf 
No 


Pi (o/) 


Pi (o/) 


( 8 . 8 - 21 ) 


One problem with the version of the BCJR algorithm described above is that it is not 
a numerically stable algorithm, particularly if the trellis length is long. An alternative 
to this algorithm is the log-domain version of it known as the Log-APP (log a posteriori 
probability) algorithm. ^ 

In the Log-APP algorithm, instead of a, /?, and y, we define their logarithms as 

(07) = In (a, (07)) 

Pi (07) = In (Pi (or,)) (8.8-22) 

Yi (cr,-_i, 07) = In (yi (or,_,, or,)) 


Straightforward calculation shows the following forward and backward recursions hold 
for 5,- (or,) and A (cr,-_ 1). 


ii (or)) = In ( yy exp (or,-_i (or,-_i) + y, (or,-,, or,)) 

to-i-ieE 


A-i (cTj-t) = In ( yy exp (A (or,) + yi (or,-], or,) 

0 tr,eX 


with initial conditions 


(8.8-23) 


~ , , JO or 0 = 0 ~ JO a N = 0 

“0 ( CT o) = \ , A (ojv) = < . 

—00 or ( ) ^ 0 —00 a N 0 


(8.8-24) 


tAlso called Log-MAP algorithm. 
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and the a posteriori L values are computed as 


L(m) = In 


^2 exp(ai _ i (o/_ i ) + Yi (cr,- _ i , tx,- ) + Pi (cr, )) 

(CT,-_l,cr,)e5i 


-In 


^2 exp(5i_i(trj_i) + y,-(rr,-i, cr,) + A(cr,-)) 

(cr,_i,cr,)eSo 


(8.8-25) 


These relations are numerically more stable but are not computationally efficient. 
To improve the computational efficiency, we can introduce the following notation: 


max*{x, >■} = ln(e A + e y ) 
max*{x, y, z } = ln(e A + e y + e z ) 

Using these definitions, we have the recursions 

Si (cr,-) = max* {a,_i (<x,-_i) + Yi (p- 1. or,-)} 

<T(_ l€E 

Pi- 1 (cr,- — i) = max*{/ 3 ,- (ct,-) + y,- (cr,-!, cr,-)} 

ct/GE 


(8.8-26) 


(8.8-27) 


where the initial conditions for these recursions are given by Equation 8.8-24. The a 
posteriori L values are given by 


L(ui) = max* {or,-_i (<r,-_i) + y,- (cr , cr,) + /?,- (cr,-)} 

- max* {a,-i (cr,_i) + y,- (cr,—, , <r,) + $ (cr, )} 
(cr,-_l,cr,)eSo 


(8.8-28) 


The initial conditions for these recursions are given by Equation 8.8-24. 


example 8 . 8 - 2 . For the special case studied in Example 8.8-1, the expression for the a 
posteriori L values can be obtained using the log-domain quantities in Equation 8.8-2 1 . 
The result is 


L(u,) = -^-^ + L>,-) + max* \ «,-_i (<7,-0 + -^-L + # (a,-) \ 
N 0 (<7,_i,oj)eSi ( Nq J 

- max* ( a,_i (g,_i) + ~~ V ' C ' + A 
(cr,_i,<T,)€So ( A^O J 

where we have defined L a (iij) as 


(8.8-29) 


P(Ui = 1) 

L a (Ui) = In 2 (8.8-30) 

P(u, = 0) 

It is seen that in this case the a posteriori L values can be written as the sum of 
three terms. The first term, > depends on the channel output corresponding to the 
systematic bits received by the decoder. The second term, L a (iij), depends on the a 
priori probabilities of the information bits. The remaining term is the contribution of 
the channel outputs corresponding to the parity bits. 
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It can be easily shown that (Problem 8.22) 

max*{x, >'} = maxfx, y) + In (l + e~^ x ~ y ^) 

, (8.8-31) 

max*{x, y, z} = max* {max*{x, y}, z} 

The term In (T + is small when x and y are not close. Its maximum occurs 

when x = y for which this term is In 2. It is clear that for large x and y or when x and 
y are not close, we can use the approximation 

max*{x, y} ~ max{x, y} (8.8-32) 

Under similar conditions we can use the approximation 

max*{x, y, z} ^ max{x, y, z} (8.8-33) 

The approximate relations in Equations 8.8-32 and 8.8-33 are valid when the 
values of x and y (or x, y, and z) are not close. In general, approximating max* by 
max in Equation 8.8-27 would result in a small performance degradation. The resulting 
algorithm, which is a suboptimal implementation of the MAP algorithm, is called that 
Max-Log-APP algorithm. 1 ' 

Instead of using the approximations given in Equations 8.8-32 and 8.8-33, one 
can use a lookup table for values of the correction term In (l + to improve the 

performance. The interested reader is referred to Robertson and Hoeher (1997), Ryan 
(2003), Robertson et al. (1995), and Lin and Costello (2004) for details. 


■ 8.9 

TURBO CODES AND ITERATIVE DECODING 

In Section 7.13-2 we introduced serial and parallel concatenated block codes in which 
an interleaver is used to construct extremely long codes. In this section we consider the 
construction and decoding of concatenated codes with interleaving, using convolutional 
codes. 

Parallel concatenated convolutional codes (PCCCs) with interleaving, also called 
turbo codes, were introduced by Berrou et al. (1993) and Berrou and Glavieux (1996). 
A basic turbo encoder, shown in Figure 8.9-1, is a recursive systematic encoder that 
employs two recursive systematic convolutional encoders in parallel, where the second 
encoder is preceded by an interleaver. The two recursive systematic convolutional 
encoders may be either identical or different. We observe that the nominal rate at the 
output of the turbo encoder is R c = 1/3. However, by puncturing the parity check bits 
at the output of the binary convolutional encoders, we may achieve higher rates, such 
as rate 1 /2 or 2/3. As in the case of concatenated block codes, the interleaver is usually 
selected to be a block pseudorandom interleaver that reorders the bits in the information 
sequence before feeding them to the second encoder. In effect, as will be shown later, 


tAlso called Max-Log-MAP algorithm. 
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Input information bits 



Output 


Output 


Output 


FIGURE 8.9-1 

Encoder for parallel concatenated code (turbo code). 


the use of two recursive convolutional encoders in conjunction with the interleaver 
produces a code that contains very few codewords of low weight. This characteristic 
does not necessarily imply that the free distance of the concatenated code is especially 
large. However, the use of the interleaver in conjunction with the two encoders results 
in codewords that have relatively few nearest neighbors. That is, the codewords are 
relatively sparse. Hence, the coding gain achieved by a turbo code is due in part to this 
feature, i.e., the reduction in the number of nearest-neighboring codewords, called the 
multiplicity, that result from interleaving. 

A standard turbo code shown in Figure 8.9-1 is completely described by the con- 
stituent codes, which are usually similar, and the interleaving pattern, usually denoted 
by n. The constituent codes, being recursive and systematic, are given by their generator 
matrix of the form 


G(D)= 1 


gi(D) 

gi(D) 


(8.9-1) 


where g\{D ) and g 2 (D) specify the feedback and the feedforward connections, respec- 
tively. Usually the constituent codes are specified by the octal representation of g i 
and g 2 . 


example 8.9-1. A (3 1 , 27) RSC encoder is represented by g i = (11001) and g 2 = 
(101 1 1) corresponding to gi(Z5) = 1 + D + D 4 and gi{D) = 1 + D 2 + D 3 + D 4 . The 
encoder is given by the block diagram shown in Figure 8.9-2. 


8.9-1 Performance Bounds for Turbo Codes 

Turbo codes are two recursive systematic convolutional codes concatenated by an inter- 
leaver. Although the codes are linear and time-invariant, the operation of the interleaver, 
although linear, is not time-invariant. The trellis of the resulting linear but time- varying 
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FIGURE 8.9-2 

A (31, 27) RSC encoder. 


Uj 



finite-state machine has a huge number of states that makes maximum-likelihood de- 
coding hopeless. In Benedetto and Montorsi (1996) it is stated that a certain turbo code 
that has been implemented in VLSI when viewed as a time- varying finite-state machine 
has 2 1030 states, making maximum-likelihood decoding impractical. 

Although maximum-likelihood decoding of turbo codes is impractical, it can serve 
to find an upper bound on the performance of these codes. By linearity of turbo codes, 
we can assume that the all-zero information sequence is transmitted. Assuming an 
interleaver of length N, there exist a total of 2 N possible information sequences with 
weights between 0 (for the all-zero sequence) and N. Let in e {1.2,..., 2' v — I j 
denote the erroneous information sequence that is detected when the all-zero sequence 
is transmitted, and let us denote the weight of this sequence by j m , where 1 < jm < N. 
Note that since the code is systematic, the weight of the codeword corresponding to the 
information sequence m, denoted by w m , is the sum of the weight of the information 
sequence j m and the weight of the corresponding parity sequence. The probability of 
decoding m when the all-zero sequence is transmitted, assuming BPSK modulation, is 
given by 

Po^m = Q (\/2Rciv m Yb) ( 8 - 9 - 2 ) 

and the corresponding bit error probability when m is detected is given by 

Pb ( 0 -► m) = ^ Q (y/2R c w m y b ) (8.9-3) 

Using the union bound, the average bit error probability is bounded by 

l 2 "~ 1 

p b<~J2 j™ Q (\/2 RcWmYb) (8.9-4) 

m— 1 

Reordering and grouping the terms corresponding to information sequences of the same 
weight, we can write 

j N O 

(V 2 Rcdjin ) (8-9-5) 

7=1 i=i 

where ('J) is the number of information sequences of weight j and dji is the weight 
of the codeword generated by the /th information sequence of weight j. Now let us 
consider the following cases as applied to the PCCC shown in Figure 8.9-1. 
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Information Sequences of Weight j = 1 An information sequence with weight 1 
( j = 1) when applied to a recursive convolutional code generates the impulse response 
of the convolutional code. Since recursive convolutional codes have infinite impulse 
response, or very large weight impulse response even when they are terminated, the 
case of j = 1 results in large values for dji and thus very low bit error probability. 
The only case that can cause a problem occurs when the single 1 in the input sequence 
occurs at the end of a block of length N, in which case the output weight is low. The 
existence of the pseudorandom interleaver, however, makes it highly unlikely that after 
interleaving the single 1 will not appear at the end of the block and thus would generate 
a high- weight codeword when applied to the second encoder. The probability of having 
a single 1 at the end of the block both before and after interleaving is very small. 

Information Sequences of Weight j = 2 There exist (^) information sequences of 
weight 2 corresponding to polynomials of the form D' 1 + D' 2 = D ,] ( + 1), where 

0 < /] < t 2 < N — 1, and /[ and G determine the location of the Is in the information 
sequence. In general, a polynomial of this form when applied to go{D) / g\{D) generates 
parity symbols of large weight, unless g\(D) divides D 1 + 1, where l = i 2 — i 
If this is the case, then D l + 1 = g\(D)h(D), where h(D) is a polynomial. The 
parity sequences generated by D‘ l + D' 2 in this case will be D''h(D)g 2 (D) which can 
correspond to a low-weight parity sequence. For instance, if gi(Z>) = 1 + D + D 2 , 
then g\(D) divides any weight 2 sequence of the form D + 1), resulting in a parity 
polynomial of the form D' 1 (1 + D)g 2 (D) which can correspond to a parity sequence 
of low weight. In this example any information sequence of weight 2 in which there 
are two zeros between the two Is will result in a low-weight parity sequenced The 
existence of the interleaver, however, makes it highly unlikely that an information 
sequence of weight 2 would generate low-weight parity sequences both before and after 
interleaving. In fact, the number of weight 2 information sequences that generate low- 
weight parity polynomials before and after interleaving is much smaller than N, where 
N is the interleaver length. In contrast, for a single RSCC this number is of the order 
of N. 

A similar argument can be applied to weight 3 and weight 4 information sequences. 
In both cases it can be argued that due to the effect of the interleaver, the number of 
weight 3 and weight 4 information sequences that generate low-weight parities is much 
lower than N. This means that low- weight codewords are possible in turbo codes, but 
their occurence is very low. In other words, the main factor contributing to the excellent 
performance of turbo codes particularly at low signal-to-noise ratios is not their good 
distance structure, but the relatively low multiplicity of codewords with low weight. 
Note that the effect of low multiplicity of turbo codes is particularly noticeable at low 
signal-to-noise ratios. At higher signal-to-noise ratios, the low minimum distance of 
these codes results in an error floor. 

If we consider information sequences of weight 2 and 3 as the main contributors 
to the error probability bound for turbo codes, we can approximate the bit error bound 


tObviously, this also applies to the case where there are five zeros between two Is, etc. 
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of Equation 8.9-5 as 

1 3 

Pb< i n jQ ( yj2Rcdj,minYb ) (8.9-6) 

N i = 2 

where c/ ; mm denotes the minimum codeword weight among all codewords generated 
by information sequences of weight j and n 7 <$C N denotes the number of information 
sequences of weight j that generate codewords of weight dj_ m in- Since n j <<C N, the 
coefficient of Q ( yj2 R r dj, min yh) is much smaller than 1. The effect of the factor I / N 
that drastically reduces the error bound on turbo codes is called the interleaver gain. 

The bounds discussed above are based on the union bounding technique that is 
loose particularly at low signal-to-noise ratios. More advanced bounding techniques 
have been studied and applied to turbo codes that provide tighter bounds at low signal- 
to-noise ratios. The interested reader is referred to Duman and Salehi (1997), Sason 
and Shamai (2000), and Sason and Shamai (2001b). 


8.9-2 Iterative Decoding for Turbo Codes 


We have seen that optimal decoding of turbo codes is impossible due to the large 
number of states in the code trellis. A suboptimal iterative decoding algorithm, known 
as the turbo decoding algorithm, was proposed by Berrou et al. (1993) which achieves 
excellent performance very close to the theoretical bound predicted by Shannon. 

The turbo decoding algorithm is based on iterative usage of the Log-APP or the 
Max-Log-APP algorithm. As it was shown in Example 8.8-2, the a posteriori L values 
can be written as the sum of three terms as 


where 


Ley- 

L a (uj) 




L( Ui ) = + L (a \ui) + L (e \ Ui ) 


4 yf£~ c y! 

No 

P(Ui = 1) 

In 

P(uj = 0 ) 


max* 



1 (of-l) + 


2 yfcf 
No 


+ Pi ( a i) 


max 

(cr,-i,<r/)eSo 


a;-t (oi-t) + 


2yf 


No 


+ Pi ( CT /) 


(8.9-7) 


(8.9-8) 


and we have defined L c = -N \f6~ c . 

The term L c y ■ is called the channel L value and denotes the effect of channel 
outputs corresponding to the systematic bits. The second term L“(n,) is the a priori 
L value and is a function of the a priori probabilities of the information sequence. The 
final term, L (n (u l ), represents the extrinsic L value or extrinsic information which is 
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the part of the a posteriori L value that does not depend on the a priori probabilities 
and the systematic information at the channel output. 

Let us assume that the binary information sequence u = (u \ , 112 , , u N ) is ap- 
plied to the first rate 1/2 RSCC, and let us denote the parity bits at the output by 
c p = (cf , c 2 , , cff). The information sequence is passed through the interleaver to 
obtain u' = (u\ , u r 2 , . . . , u' N ), and this sequence is applied to the second encoder to 
generate the parity sequence c' p = (cf, c 2 , . . . , eft). Sequences u, c 1 ’, and c ,p are 
BPSK modulated and transmitted over a Gaussian channel. The corresponding output 
sequences are denoted by y s , y e , and y fp . The MAP decoder for the first constituent 
code receives the pair (y s , y p ). In the first iteration the decoder assumes all bits are 
equiprobable, and therefore the a priori L values are set to zero. Having access to 
(y s , y p ), the first decoder uses Equation 8.8-29 to compute the a posteriori L values. 
At the output of the first constituent decoder, the decoder subtracts the channel L val- 
ues from the a posteriori L values to compute the extrinsic L values. These values 
are denoted by L^iui) and are permuted by the interleaver n and then used by the 
second constituent decoder as its a priori L values. In addition to this information, 
the second decoder is supplied with y' p and a permuted version of y s after passing 
it through the interleaver II. The second decoder computes the extrinsic L values de- 
noted by L 2 i(u{) and after permuting them through n 1 supplies them to the first 
encoder, which in the next iteration uses these values as its a priori L values. This 
process is continued either for a fixed number of iterations or until a certain criterion 
is met. After the last iteration the a posteriori L values L(n,) are used to make the final 
decision. 

The building block of the turbo decoder is an SISO decoder with inputs y s , y p , 
and L (a \ui) and outputs L <,!) (u, ) and Liu, ). In iterative decoding L iu> (Uj) is substituted 
by the extrinsic L values provided by the other decoder. The block diagram of a turbo 
decoder is shown in Figure 8.9-3. 

A typical plot of the performance of the iterative decoding algorithm for turbo codes 
is given in Figure 8.9-4. It is clearly seen that the first few iterations noticeably improve 
the performance. It is seen from these plots that three regions are distinguishable. For 
the low-SNR region where the error probability changes very slowly as a function of 
8b /No and the number of iterations, for moderate SNRs the error probability drops 
rapidly with increasing 8/, /Nq and over many iterations P\, decreases consistently. This 
region is called the waterfall region or the turbo cliff region. Finally, for moderately 
large 8b/ No values, the code exhibits an error floor which is typically achieved with a 



FIGURE 8.9-3 

Block diagram of a turbo decoder. 
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FIGURE 8.9-4 

Performance of iterative decoding for 
turbo codes. 


few iterations. As discussed before, the error floor effect in turbo codes is due to their 
low minimum distance. 

Typically, four iterations are adequate if the decoders are operating at a high enough 
SNR to achieve an error rate in the range 1 0 5 to 10~ 6 , whereas about eight to ten 
iterations may be needed when the error rate is in the range of 1 0 5 , where the SNR is 
lower. 

An important factor in the performance of the turbo code is the length of the 
interleaver, which is sometimes referred to as the interleaver gain. With a sufficiently 
large interleaver and iterative MAP decoding, the performance of a turbo code is very 
close to the Shannon limit. For example, a rate 1 /2 turbo code of block length N = 2 16 
with 18 iterations of decoding per bit achieves an error probability of 1 0 s at an SNR 
of 0.7 dB. From Figure 6.5-6 we see that the Shannon limit for a binary input rate 1 /2 
code is roughly 0.19 dB. This means that this code operates 0.5 dB from the Shannon 
limit. 

The major drawback with decoding turbo codes with large interleavers is the de- 
coding delay and the computational complexity inherent in the iterative decoding al- 
gorithm. In most data communication systems, however, the decoding delay is tolera- 
ble, and the additional computational complexity is usually justified by the significant 
coding gain that is achieved by the turbo code. A second method for constructing 
concatenated convolutional codes with interleaving is serial concatenation. Benedetto 
et al. (1998) have investigated the construction and the performance of serial con- 
catenated convolutional codes (SCCCs) with interleaving and have developed an iter- 
ative decoding algorithm for such codes. In comparing the error rate performance of 
SCCC with PCCC (turbo codes), Benedetto et al. (1998) found that SCCC generally 
exhibit better performance than PCCC for error rates below 10 2 . For more details 
on the properties of turbo codes, the reader is referred to Lin and Costello (2004), 
Benedetto and Montorsi (1996), Heegard and Wicker (1999), and Hagenauer et al. 
(1996). 
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8.9-3 EXIT Chart Study of Iterative Decoding 


Due to complexity of the iterative decoding algorithm, study of its convergence prop- 
erties is difficult. A useful tool in studying the performance of iterative decoding of 
turbo codes, particularly in the turbo cliff region, is the Extrinsic Information Transfer 
(EXIT) chart. These charts were introduced by ten Brink (2001) and have served as a 
useful tool in performance study and design of different iterative algorithms. 

In Section 8.9-2 we have seen that an iterative decoder for a standard turbo code 
consists of two similar SISO decoders which accept the a priori and channel information 
at their input and generate the extrinsic information and the log-likelihood values at 
the output. The two SISO decoders are connected in such a way that the extrinsic 
information L (£,) of each serves as the a priori information L ia) for the other one. The 
development of the EXIT chart is based on the empirical observation (ten Brink (2001)) 
that the a priori L value and the transmitted systematic bits are related through 

2 

L ia) = yC (s) + n a (8.9-9) 

where n a is a zero-mean Gaussian random variable with variance o 2 , and C (s) denotes 
the normalized systematic transmitted symbol that can take values ±1. From this we 
conclude that 

1 ( l-«r 2 /2) 2 

Pz»|cw(£|c) = - e (8.9-10) 

s/lizcj- 


where c = ±1 with equal probability. The mutual information between L ia] and C (s) 
is denoted by I a and is given by 


la 


r °° 

P{1 \ C) IOg 2 


2 p{l\c) 

p{t\C = -\)+p{t\C=\) 


dl 


(8.9-11) 


Using Equation 8.9-10 in 8.9-1 1 and using an approach similar to the approach taken 
in the derivation of Equations 6.5-31 and 6.5-32, we obtain 


la = 1 - E 


log, (l 


(8.9-12) 


where the expectation is with respect to the joint distribution of C (x) and L ia) . 

It is clear that 0 < I a < 1 , and it can be shown to be a monotonically increasing 
function of cr; thus given the value of I a , a can be uniquely determined. 

A similar argument can be applied to the extrinsic information L (e> to derive I e , the 
mutual information between L (e> and C (v) . The extrinsic information transfer (EXIT) 
characteristic is defined as I e when expressed as a function of I a and Sb/No, i.e., 


I e = T(I e , £ h /N 0 ) 


(8.9-13) 


or simply as 


Ie = T(I a ) 


(8.9-14) 
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FIGURE 8.9-5 

EXIT chart for a rate 2/3 


convolutional code for different 
values of £b/No- [From ten Brink 
(2001) © IEEE.] 
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where this characteristic is plotted for different values of 8b/ N q. Since the values of 
I a and I e are not given explicitly, Monte Carlo simulation is usually used to find the 
expected value in Equation 8.9-12. This is done over a large number of samples N, 
and l a is computed as 


The EXIT chart for a (23, 37) RSCC after puncturing to increase the rate from 1 /2 
to 2/3 is shown in Figure 8.9-5. The plots are shown for values of £/,/ Nq in the range 
of -0.5 dB to 3 dB. 

For turbo codes, the extrinsic information generated by a decoder acts as the a 
priori information for the next stage. To study the operation of an iterative decoder for a 
turbo code, we plot the two EXIT functions of the constituent codes and move between 
the two plots along the horizontal and vertical directions corresponding to equating the 
extrinsic information of one encoder to the a priori information of the other, as shown 
in Figure 8.9-6. 

As seen in Figure 8.9-6, the iterative decoding begins with the assumption of 
equal probabilities for the information bits. This corresponds to l a \ =0 and moves 
horizontally and vertically between the two EXIT graphs. It is seen that when 8b /No = 
0. 1 dB, the two EXIT graphs intersect at low values of I a and /,,, as noted in the lower left 
comer of Figure 8.9-6. In this case after a couple of iterations no more improvement 
is achieved, and low values of mutual information indicate a high error probability. 
This behavior corresponds to the low signal-to-noise ratio region in Figure 8.9-4 and 
sometimes is referred to as the pinch-off region. For higher values of 8b/ No, the two 
EXIT graphs become separated and there exists a bottleneck region through which the 
iterative decoding trajectory climbs to high l a and I e values corresponding to low error 



(8.9-15) 
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output I e2 of second decoder = input I al of first encoder 


FIGURE 8.9-6 

Simulated trajectories of iterative 
decoding for £b/No = 0.1 and 
£b/No = 0.8 dB. [From ten Brink 
(2001) ©IEEE.] 


probabilities. This region corresponds to the waterfall region in Figure 8 .9 — 4. Finally, 
for large EjJ No values the graphs in the EXIT charts become wide open with fast 
convergence to the error floor. Figure 8.9-7 depicts another example of EXIT charts 
for various values of £/, /No- The trajectories for £b/ No = 0.7 dB corresponding to the 
waterfall region and £b/No = 1 .5 dB are shown for comparison. 

In addition to providing insight to the performance of iterative decoding schemes, 
EXIT charts have been used in the design of highly efficient codes as well as other 
iterative methods such as iterative equalization. 



FIGURE 8.9-7 

EXIT chart trajectories for £/,/ No = 

0.7 dB and £b/No = 1.5 dB. Simulation 
is done for an interleaver size of 10 6 
bits. [From ten Brink (2001 ) © IEEE.] 
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■ 8.10 

FACTOR GRAPHS AND THE SUM-PRODUCT ALGORITHM 

We have observed that the trellis representation of convolutional codes is a convenient 
graphical representation that is very useful in the implementation and understanding 
of the maximum-likelihood decoding of these codes using the Viterbi algorithm or 
the symbol-by-symbol maximum a posteriori decoding using the BCJR algorithm. 
Representation of codes by more general graphical models is a convenient method in 
studying the performance of some decoding algorithms. Graph representation is not 
limited to decoding algorithms but has many applications to signal processing, circuit 
theory, control theory, networking, and probability theory. In this section we provide 
an introductory treatment of some of the basic graphical models used in the design of 
a general algorithm called the sum-product algorithm. 

The sum-product algorithm was first introduced by Gallager (1963) as a decoding 
method for low-density parity check (LDPC) codes. Later, Tanner (1981) introduced 
graphical models to describe this class of codes. These graphical models are known as 
Tanner graphs. Wiberg et al. (1995) and Wiberg (1996) showed that the Viterbi and 
BCJR algorithms as well as decoding algorithms for turbo and LDPC codes can be 
unified in a single algorithm on certain graphs. The idea of graph representation of 
codes was further developed and generalized by Forney (2001). 


8.10-1 Tanner Graphs 

Recall that an (n. k) linear block code C is described by at x n generator matrix G 
through 


where u is an information sequence of length k and c is the corresponding codeword. A 
binary sequence of length 77 is a codeword of C if and only if Equation 8.10-1 is satisfied 
for some binary sequence u. The parity check matrix of this code H is an (77 — k) x n 
binary matrix defined as the generator matrix of the dual code C 1 . A necessary and 
sufficient condition for c to be a codeword is that 


c = uG 


( 8 . 10 - 1 ) 


cH' = 0 


( 8 . 10 - 2 ) 


This equation can be written in terms of n — k relations 


ch\ = 0 
ch\ = 0 


cK-k = 0 


(8.10-3) 
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FIGURE 8.10-1 

An example of a graph. 


where h t denotes the ith row of H. These equations introduce a set of n — k linear 
constraints on a codeword c. For instance in a (7, 4) Hamming code with 


these equations become 


H = 


1 1 
0 1 
1 1 


10 10 0 ' 
110 10 
0 10 0 1 


(8.10-4) 


ft + ci + C3 + C5 = 0 

ci T C 3 + C 4 T C(, = 0 (8.10—5) 

C 1 + c 2 + C 4 + C 7 = 0 


where addition is modulo-2. For a (3, 1) repetition code we have 


and the parity check equations become 

Cl + C2 = 0 
Cl + c 3 = 0 


( 8 . 10 - 6 ) 


(8.10-7) 


A Tanner graph is a graphical representation of Equations 8.10-3 as a bipartite 
graph. In general, a graph is a collection of nodes (or vertices) and edges (or links) such 
that each edge connects two nodes; i.e., each edge of the graph is uniquely determined 
by the two nodes it connects. An example of a graph is shown in Figure 8.10-1. The 
degree of a node is the number of edges that are incident on that node. 

A graph is called a bipartite graph if the nodes of the graph can be partitioned into 
two subsets N\ and A7 such that each edge has one node in N \ and one node in No. In 
other words, there exists no edge that connects two nodes both in N\ or both in No. An 
example of a bipartite graph is shown in Figure 8.10-2. 




n 2 FIGURE 8.10-2 

A bipartite graph. 
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FIGURE 8.10-3 

The Tanner graph for the (3,1) repetition code. 



A Tanner graph representation of Equations 8. 1 0-3 can be obtained by representing 
the each codeword component Ci, 1 < i < n, of a codeword c as a node i in N\ and 
each of the n — k constraints given by Equation 8.10-3 as a node /', 1 < j < n - k, 
in No- There exists an edge connecting node i in N\ to node j in /V? if and only if a 
appears in the jth parity check equation. Figures 8.10-3 and 8.10-4 depict the Tanner 
graphs for the (3,1) repetition code and the (7, 4) Hamming code, respectively. Note 
that since H for a code is not unique, its Tanner graph is not unique either. 

One major difference between the two graphs shown in Figures 8.10-3 and 8.10-4 
is that the first graph does not include cycles', that is, a path on the edges does not exist 
that starts from a node and ends in the same node. However, the second graph includes 
cycles, as clearly seen on the graph. A cycle-free graph is a graph in which removing 
any edge divides the graph into two disconnected graphs. The length of the shortest 
cycle included in a graph is called the girth of the graph. The girth of the graph shown 
in Figure 8.10-4 is 4. 

In the Tanner graph of Figure 8.10-4 two types of nodes are distinguishable: the 
variable nodes, which correspond to the variables supplied to the Tanner graph (these are 
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FIGURE 8.10-4 

The Tanner graph for the (7, 4) Hamming code. 
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the nodes denoted by circles on the left), and the constraint nodes that force a relation 
between the variables. These nodes are denoted by squares on the right. A binary 
sequence c is a codeword if it satisfies the three constraints given by Equations 8.10-5. 
Let us define the indicator function of a proposition P as 


S[P] = 


if P is true 
if P is false 


Then, for instance, 


<5[ci + C 2 + C 3 + C 5 — 0] — 


if Ci + C2 + C3 + C5 — 0 
if C] + C2 + C3 + C5 = 1 


and c is a codeword if 


( 8 . 10 - 8 ) 


(8.10-9) 


5[ci +C2 + C3 +C5 — 0]<5 [C2 TC3 T C4 Teg — 0]<5[ci +C2 + C4 + C7 — 0] — 1 (8.10—10) 


The graph shown in Figure 8.10-4 is a graphical representation of the relation given 
by Equation 8.10-10. We note that the product function of Equation 8.10-10 which 
represents a global constraint for c to be a codeword can be factored into three local 
constraints. Any input to this graph is a valid input if it results in a nonzero global value 
for the global equation of the graph; and this can occur only if the input is a codeword. 
Tanner graphs are special cases of factor graphs to be studied in the next section. 


8.10-2 Factor Graphs 

Let us assume that f(x 1 , xi, . . . , x„) is a real- valued function of n variables xi, ... ,x n 
where x, takes values in a discrete set X. Assume we are interested in computing a 
marginal function of one variable f) (x, ) as 

fi(Xi) = " ■'52f( x t’ x 2< ■ ■ ■ > x n) ( 8 . 10 - 11 ) 

X\ X2 Xi - 1 X [- pi X n 

This, for instance, can be the case if we have the joint PDF of 11 random variables 
and want to compute the marginal PDF of x,. If the size of the set X is [ X \ , then 
computing this sum requires \X\ n ~ l operations. If we use the the shorthand notation 
~x, to indicate summation over all variables except x,-, then Equation 8.10-1 1 can be 
written in the more compact form 

= ^ /(*!,...,*„) ( 8 . 10 - 12 ) 

Computation of /)(x,) can be made considerably simpler if the global function 
f(x \ , X2 , . . . , x„) is a factor of some local functions depending on a subset of variables, 
i.e., if for jc = (xi, *2, • • • , *n) we can write 

M 

f( x ) = ]^[ gmiXm) 
m = 1 


(8.10-13) 


562 


Digital Communications 


where x m , 1 < m < M, is a subset of components of x. For instance, in the case 
where 


f(x l,x 2 , X 3 , X 4 , X 5 , X 6 , XT, X 8 ) = gl(xi)g 2 (x 2 )g3(x l , X 2 , X 3 , X 4 )g 4 (x 4 , X 5 , X 6 ) 

X g 5 (x 5 )g 6 (x 6 , X 7 , X 8 )g 7 (x 7 ) 

(8.10-14) 


we have 


f 4 (x 4 ) = I gl(Xl)g 2 (X2)g3(Xl,X2,X 3 ,X 4 ) J 

X ( 84(x 4 , X 5 , X 6 )g 5 (x 5 ) f ^2 S6(x 6 , Xt, X 8 )g 7 (x 7 ) j j 

V-t5--f6 V*7.*8 ) ) 


(8.10-15) 


which requires less computation than the general case. 

Let us assume that /( jc) is given by Equation 8.10-13. Then a factor graph repre- 
senting this global function is a graph consisting of a M nodes and n edge or half-edges. 
An edge connects two nodes, and a half-edge just represents a value entering a node. 
Therefore a half-edge on one side is connected to a node and on the other side is free. 
Each edge or half-edge of the factor graph uniquely represents a variable, and each 
node uniquely represents a local function. Since we are assuming that each edge or 
half-edge uniquely represents a variable, this representation is possible only if each 
variable appears in at most two local functions. We will see shortly how this limitation 
can be removed. 

example 8 . 10 - 1 . The factor graph representing 


p(w, u, v, x\, x 2 , y ) = p(u, v, w)p(xi\u)p(x 2 \v)p(y\xi, x 2 ) (8.10-16) 

is shown in Figure 8.10-5. Note that two half edges corresponding to variables w and 
y appear just in one local function. 

If a variable appears in more than two local functions, we introduce a cloning node 
that makes copies of this variable. Then we can supply these copies to local functions 
(nodes on the graph) that need them. A cloning node is given by equality constraints. 

example 8 . 10 - 2 . Let us consider the function 


f(xi,x 2 , x 3 , x 4 , x 5 ) = gi(x lz x 2 )g 2 (xi, x 3 )g 3 (x l , x 4 )g 4 (x 3 , x 4 , x 5 ) (8.10-17) 



FIGURE 8.10-5 

Factor graph representing Equation 8.10-16. 
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FIGURE 8.10-6 

The factor graph representing Equation 8.10-17. 


In this function the variable appears in three local functions and hence has to be 
cloned. The factor graph in Figure 8.10-6 shows how the equality constraint is in- 
troduced to carry out this cloning. The equality constraint is a local function of the 
form 


gjx i, x[, x'l) = <5(*i = x[)8(x i = x") (8.10-18) 

This means that the value of this local function is 1 if and only if x\ = x\ — x". If 
this constraint is not satisfied, the value of the function is zero, making the value of 
the global function zero. This means that for such values of (jci , x \ , x") the value of 
the global function is not positive, and hence such a combination is not a valid input. 
Introducing g = as in Equation 8.10-18 makes it possible to have a variable in more 
than two local functions. 

example 8.10-3. The factor graph representation of the Tanner graph for the Ham- 
ming code shown in Figure 8.10-4 is shown in Figure 8.10-7. 



FIGURE 8.10-7 

The factor graph representation for a (7, 4) Hamming code. 
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FIGURE 8.10-8 

The factor graph representation of the function in Equation 8.10-14. 


8.10-3 The Sum-Product Algorithm 

The sum-product algorithm is an efficient algorithm for computing marginals of the 
form 

f(Xi) = J2 /(*i. * 2 , • • • , *n) (8.10-19) 


using the factor graph for f(x \ ..... x n ). The basic idea is to sum over some of the 
variables and then transmit two different messages in opposite directions across each 
edge of the factor graph. The messages transmitted across each edge are functions of the 
variable corresponding to that edge. These functions are usually expressed as vectors 
whose components represent different values that these functions can take for different 
values of the edge variable. This means that the dimensionality of the vector for each 
edge is equal to the cardinality of the variable represented by that edge. In applications 
of this algorithm to coding problems, since variables are usually binary, the vectors 
representing the messages are two-dimensional vectors. A more convenient way in this 
case, where the messages usually represent the probabilities of the variable being equal 
to 0 or 1, is to use the ratio of the probabilities (likelihood ratio) or its logarithm (the 
log-likelihood ratio LLR). 

Let us consider the marginal represented by Equation 8.10-15 as ' 


f A (x 4 ) = I 8l(Xl)g2(X2)g3(XuX2,X 3 ,X 4 )\ 

\Xl,X2,*3 / 

X X! 84 ^ X4 ’ X5 ’ X 6)g5(X5) Y g6 ^ 6 ’ * 7 ’ X ^Sl(X7) 


( 8 . 10 - 20 ) 


The factor graph for f(x i, xj, xj, x 4 , x A , jcg, jc 7 , x%) is represented by Figure 8.10-8, 
where elements in the boxes correspond to the partial sums in Equation 8.10-20. 


tThis example is taken from Loeliger (2004). 
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We define 


/\ 3> , 4 U 4 ) = Y 8l( X l'>82(X2)g3(XuX2,X 3 ,X 4 ) 

Xl,X2,X3 

= Y 86 ( x 6 > x 7 , *8)g 7 (*7) (8.10-21) 

* 7 , *8 

!2 h , h (X4) = Y 84 (x 4 , X 5 , X 6 )g 5 (x 5 )[M S6 ^ 6 (x 6 ) 

x 5, x 6 

as the messages passed at g 3 , g 6 , and g 4, respectively. Referring to Figure 8.10-8, 
we note that /i (x 6 ) is the message passed out of the inner box summarizing its 

content and /r, and /i i: ^ are the two messages sent in opposite directions on the 

edge corresponding to variable x 4 . Equation 8.10-20 states that the marginal f 4 (x 4 ) 
is the product of the two messages passed along the edge corresponding to x 4 . What 
we have done here is that we have successively summarized each subsystem and used 
the result to summarize the next system. The resulting algorithm, known as the sum- 
product algorithm, can be summarized as follows. Each node corresponding to local 
function g(x 1 , X2, , x„ ) receives messages corresponding to local variables x, on 
the branches corresponding to these variables. The received messages are denoted by 
\i x > (xj ). Based on these messages the node computes the outgoing message /1 ^ (x l ) 
and sends it over the branch corresponding to Xj . A diagram representing this process 
is shown in Figure 8.10-9. 

The outgoing messages are computed using the relation 

(xt) = Y 8 ( x 1 » • • • . x n ) n ^j-* g ( x j) (8.10-22) 

~ x i j¥=‘ 

where ^ (xj) is the incoming message on edge j corresponding to variable xj. Note 
that in computing the outgoing message on the edge corresponding to x -, , we have used 
all incoming messages except the message corresponding to edge x, . This is equivalent 
to saying that the extrinsic information is passed over node x, . For some special nodes 
the following rules are followed: 

1 . The message sent over a half-edge to the (single) node connecting to it is a message 
with value 1 . 


x l 





x i 


8 


X n 



I*x n - S <- X n> 




FIGURE 8.10-9 

The local computation in sum-product algorithm. 
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2. If g is a function of a single variable x,-, then the product term in Equation 8.10-22 
becomes empty and the equation reduces to 

(Xi ) = g(xj) (8.10-23) 

3. For a cloning node g = with equality constraint, simple substitution in Equa- 
tion 8.10-22 yields 


(Xi) = 1] /V*= (*i) (8.10-24) 

m 

There exists a sharp contrast in applying the sum-product algorithm to cycle-free 
graphs and graphs with cycles. In a cycle-free graph, the sum-product algorithm can start 
from all leaves of the graph and proceed along the nodes as their incoming messages 
become available. Since the graph is cycle-free, each message is computed only once. 
After this step is done, the marginals corresponding to each variable can be found as the 
product of the two messages sent in opposite directions on the edge corresponding to 
that variable. For cycle-free graphs the sum-product algorithm converges to the correct 
marginals in a finite number of steps. If the graph has cycles, then the convergence 
of the algorithm is not guaranteed. However, in many practical cases of interest the 
algorithm converges even for graphs that include cycles. 

Factor Graph of a Code 

For a code C with codewords c M 1 < i < M, the global function can be written as 
<$[c e C]. If c is a codeword, then this function is equal to 1, indicating that c is a valid 
input. For a noncodeword sequence, the value of the global variable is zero, indicating 
that the input is not valid. 

Depending on the code characteristics this global function can be factorized differ- 
ently. For instance, for convolutional codes this function can be written as the product 
of the conditions that each component of c must be part of a path through the code 
trellis and, therefore, must correspond to a transition between states rr, _ ] and cr, . For the 
(7, 4) Hamming code the global function can be written as the product of three parity 
check (local) functions as 

<5[c £ C] = <5[ci + C2 + C3 + C5 = 0]<5[C2 + C3 + C4 + Cg = 0]<5[Ci + C2 + C4 + C7 =0] 

(8.10-25) 

In binary block codes two types of nodes are present in the factor graph of the 
code: the n — k constraint nodes that represent the n—k parity check equations of the 
form ch' s = 0 for I < s < n — k and the equality constraint nodes (cloning nodes) 
corresponding to codeword components that appear in more than two parity check 
equations. We have already seen that for the equality constraint nodes 

= n/w^ 


(8.10-26) 
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For the parity check nodes, if the messages are two-dimensional vectors representing 
the probability of the edge variable being ' 0 or 1, we can show that (see Problem 8.25) 

W« = o) = 5 + * IK* -W)> 

j 3+1 (8.10-27) 

/W c ‘- = !) = j ~ 2 n (1 - 2 ^ (1)) 

where /? ; ( I ) denotes the incoming probability that the / tit edge takes the value 1. 


8.10-4 MAP Decoding Using the Sum-Product Algorithm 


A code C with codewords c, , 1 < i < M, is used for communication over a memory less 
channel. Codeword c is transmitted over the channel and y is received, and at the decoder 
we are interested in performing symbol-by-symbol maximum a posteriori decoding that 
maximizes p(cj |y). This can be written as 


Ci = argmax p(c mi \y) 

1 <m<M 


= argmax ^ p(c m \y) 

1 <m<M 

L mi 

= argmax ^ p(c m )p(y\c m ) 

1 <m<M 

t-mi 

n 

= arg max ^ p(c m ) p(y, \c mi ) 


1 <m<M 


i=l 


This quantity has to be computed over all codewords c,„ . 

For an arbitrary binary sequence of length n denoted by c we have 


P(c) = 


if 


otherwise 


or equivalently we can write 


p(c) = —S[C € C] 


The MAP decoding rule then becomes 

n 

Ci = arg max ^ <5[c e C\ p(y t | c, ) 


i= 1 


(8.10-28) 


(8.10-29) 


(8.10-30) 


(8.10-31) 


tOr, equivalently, when the incoming two-dimensional message vector to each node is appropriately nor- 
malized such that the two components add to 1 , i.e. , if the messages are ( > M ( 0 )+,i( |) ) • 
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FIGURE 8.10-10 

The code-channel factor graph for a (7, 4) Hamming 
code. 
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The factor <5[c e C | determines the factor graph of the code, and factors /t(y, |c, ) 
are nodes (functions) connected to the inputs (variable nodes) of the code factor graph 
with y t as the input and /)(>',■ |c, ) as the node function. The resulting factor graph for 
a (7, 4) Hamming code is shown in Figure 8.10-10. In this graph the leftmost squares 
represent the channel conditional probabilities p{yi\ci). 

The decoding process begins by supplying the channel outputs y t as the variables 
to the variable nodes of the code-channel factor graph. Using the values of p(y t |c, ) and 
Equations 8.10-31 and 8.10-27, the decoder can apply the sum-product algorithm to 
find the marginal probabilities of each edge variable. The iterations are continued either 
for a fixed number of times or until a stopping criterion is satisfied. One such stopping 
criterion can be cH‘ = 0. 


■ 8.11 

LOW DENSITY PARITY CHECK CODES 

Low density parity check codes (LDPCs) are linear block codes that are characterized 
by a sparse parity check matrix. These codes were originally introduced in Gallager 
(1960, 1963), but were not widely studied for the next twenty years. Although Tanner 
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(1981) introduced the graphical representation of these codes, it was not until after 
the introduction of turbo codes and the iterative decoding algorithm that these codes 
were rediscovered by Mac Kay and Neal (1996) and MacKay (1999). Since then these 
codes have been the topic of active research in the coding community motivated by 
the excellent performance of these codes, which is realized by using iterative decoding 
schemes based on the sum-product algorithm. In fact, it has been shown that these 
codes are competitors to turbo codes in terms of performance and, if well designed, 
have better performance than turbo codes. Their excellent performance has resulted in 
their adoption in several communication and broadcasting standards. 

Low density parity check codes are linear block codes with very large codeword 
length n usually in the thousands. The parity check matrix H for these codes is a large 
matrix with very few Is in it. The term low density refers to the low density of Is in the 
parity check matrix of these codes. 

A regular low density parity check can be defined as a linear block code with a 
sparse m x n parity check matrix H satisfying the following properties. 

1. There are w r Is in each row of H , where w r <3C min {m, n). 

2. There are w c Is in each column of H , where w c <$C min {m, n}. 


The density of a low-density parity check code, denoted by r, is defined as the ratio of 
the total number of Is in H to the total number of elements in H. The density is given 
by 


w r w c 

n m 

from which it is clear that 

m w c 
n w r 


If the matrix H is full rank, then m = n — k 


otherwise, 


Rc = 1 - - = 1 - 
n 


Wc 

W r 


Rc = l - 


rank(ff) 

n 


( 8 . 11 - 1 ) 

( 8 . 11 - 2 ) 

(8.11-3) 

(8.11-4) 


The Tanner graph of a regular low density parity check code consists of the usual 
constraint and variable nodes. The low density constraint of the code, however, makes 
the degree of all constraint (parity-check) nodes equal to w r which is much less than 
the code block length. Similarly the degree of all variable nodes is equal to w c . The 
Tanner graph for an LDPC code is shown in Figure 8.11-1 

The Tanner graph of LDPC codes usually is a graph with cycles. We have previously 
defined the girth of a graph as the length of the shortest cycle in that graph. Obviously 
a bipartite graph with cycles has a girth that is least equal to 4. A common decoding 
technique used for LDPC codes is the sum-product algorithm discussed in the preceding 
section. This algorithm is effective when the girth of the Tanner graph of the LDPC 
code is large. The reason for this behavior is that in order for the sum-product algorithm 
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FIGURE 8.11-1 

The Tanner graph for a regular LDPC 
code with w r = 4 and w c = 3. 


+ 


+ 


+ 


to be effective on a graph with cycles, the value of the extrinsic information must be 
high. If the girth of the LDPC code is low, the information corresponding to a bit loops 
back to itself very soon, hence providing a small amount of extrinsic information and 
resulting in poor performance. Design techniques for LDPC codes with large girth are 
a topic of active research. We have seen in the preceding section that if the Tanner graph 
of a code has no cycles, then the sum-product algorithm converges in a finite number of 
steps. However, it has been shown that high-rate LDPC codes whose graph is cycle-free 
have low minimum distance and hence their bit error rate performance is poor. 

An irregular LDPC code is one in which the number of Is in rows and columns 
of H is low but is not constant for all rows and columns. Irregular low density parity 
check codes are usually described in terms of two degree distribution polynomials ).(x ) 
and p(X), for variable nodes and constraint nodes, respectively. These polynomials are 
defined as 

dr 

x(x) = ^vA 1 

d=1 (8.11-5) 

dc 

p(x) = PdX d ~ l 

d= I 


where Xj and p,/ denote the fraction of all edges connected to variable and constraint 
nodes of degree d, respectively. It is clear that for a regular LDPC code we have 


k(x) = x Wc 1 
p(x) = x Wr ~ l 


( 8 . 11 - 6 ) 


Very long irregular LDPC codes have been designed to operate within 0.0045 dB of 
the Shannon limit (see Chung et al. (2001)). 


8.11-1 Decoding LDPC Codes 

The two main algorithms used to decode LDPC codes are the bit-flipping algorithm 
and the sum-product algorithm, the latter also referred to as the belief propagation algo- 
rithm. The bit-flipping algorithm is a hard decision decoding algorithm with low com- 
plexity. The sum-product algorithm is a soft decision algorithm with higher complexity. 
We have already studied the sum-product algorithm in Section 8.10-3. Applying this 
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algorithm to LDPC codes is straightforward and is based on applying Equations 8.10-3 1 
and 8.10-27 to the code-channel factor graph. 

The bit-flipping algorithm is a hard decision decoding algorithm. Let us assume 
that y is the hard channel output, i.e., the channel output quantized to 0 or 1. In the first 
step of the bit-flipping algorithm, the syndrome s = yH ' is computed. If the syndrome 
is zero, then we put c = y and stop. Otherwise, we consider the nonzero components 
of s corresponding to parity check equations that are not satisfied by the components 
of y. The update of y is done by flipping those components of y that appear in the 
largest number of unsatisfied parity check equations. Equivalently, these are the node 
variables that are connected to the largest number of unsatisfied constraint nodes of the 
graph of the LDPC code. After the update the syndrome is computed again, and the 
whole process is repeated for a fixed number of iterations or until the syndrome is equal 
to zero. The interested reader can refer to Lin and Costello (2004) for more details on 
bit-flipping decoding and its various forms. 


8.12 

CODING FOR BANDWIDTH-CONSTRAINED CHANNELS — TRELLIS 
CODED MODULATION 

In the treatment of block and convolutional codes, performance improvement was 
achieved by expanding the bandwidth of the transmitted signal by an amount equal to 
the reciprocal of the code rate. Recall for example that the improvement in performance 
achieved by an ( n , k) binary block code with soft-decision decoding is approximately 
10 l()g| {| (7? r (/ mm — k In 2/yt,) compared with uncoded binary or quaternary PSK. Lor 
example, when yi, = 10, the (24, 12) extended Golay code gives a coding gain of 5 dB. 
This coding gain is achieved at a cost of doubling the bandwidth of the transmitted 
signal and, of course, at the additional cost in receiver implementation complexity. 
Thus, coding provides an effective method for trading bandwidth and implementation 
complexity against transmitter power. This situation applies to digital communication 
systems that are designed to operate in the power-limited region where R/W < 1 . 

In this section, we consider the use of coded signals for bandwidth-constrained 
channels. Lor such channels, the digital communication system is designed to use 
bandwidth-efficient multilevel amplitude and phase modulation, such as PAM, PSK, 
DPSK, or QAM, and operates in the region where R/W > 1. When coding is applied 
to the bandwidth-constrained channel, a performance gain is desired without expanding 
the signal bandwidth. This goal can be achieved by increasing the number of signals 
over the corresponding uncoded system to compensate for the redundancy introduced 
by the code. 

Lor example, suppose that a system employing uncoded four-phase PSK modula- 
tion achieves an R/W = 2 (bits/s)/Hz at an error probability of 10 6 . Lor this error rate 
the SNR per bit is yt, = 10.5 dB. We may try to reduce the SNR per bit by use of coded 
signals, but this must be done without expanding the bandwidth. If we choose a rate 
R c =2/3 code, it must be accompanied by an increase in the number of signal points 
from four (2 bits per symbol) to eight (3 bits per symbol). Thus, the rate 2/3 code used 
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in conjunction with eight-phase PSK, for example, yields the same data throughput as 
uncoded four-phase PSK. However, we recall that an increase in the number of signal 
phases from four to eight requires an additional 4 dB approximately in signal power to 
maintain the same error rate. Hence, if coding is to provide a benefit, the performance 
gain of the rate 2/3 code must overcome this 4-dB penalty. 

If the modulation is treated as a separate operation independent of the encoding, 
the use of very powerful codes (large-constraint-length convolutional codes or large- 
block-length block codes) is required to offset the loss and provide some significant 
coding gain. On the other hand, if the modulation is an integral part of the encoding 
process and is designed in conjunction with the code to increase the minimum Euclidean 
distance between pairs of coded signals, the loss from the expansion of the signal set is 
easily overcome and a significant coding gain is achieved with relatively simple codes. 
The key to this integrated modulation and coding approach is to devise an effective 
method for mapping the coded bits into signal points such that the minimum Euclidean 
distance is maximized. Such a method was developed by Ungerboeck (1982), based 
on the principle of mapping by set partition ing. We describe this principle by means of 
Examples 8.12-1 and 8.12-2. 

Set partitioning We begin with a given signal constellation, such as M- ary PAM, 
or QAM or PSK, and partition the constellation into subsets in a way that the minimum 
Euclidean distance between signal points in a subset is increased with each partition. The 
following two examples illustrate the set partitioning method proposed by Ungerboeck. 

example 8.12-1. AN 8 -PSK SIGNAL constellation. Let us partition the eight-phase 
signal constellation shown in Figure 8.12-1 into subsets of increasing minimum 
Euclidean distance. In the eight-phase signal set, the signal points are located on a 
circle of radius \f£ and have a minimum distance separation of 

d 0 = 2V£ sin \n = y/(2 - V2)£ = 0.765V£ 

In the first partitioning, the eight points are subdivided into two subsets of four points 
each, such that the minimum distance between points increases to d\ = \f2£. In the 
second level of partitioning, each of the two subsets is subdivided into two subsets of 
two points, such that the minimum distance increases to d 2 = 2 yf£. This results in four 
subsets of two points each. 

Finally, the last stage of partitioning leads to eight subsets, where each subset 
contains a single point. Note that each level of partitioning increases the minimum 
Euclidean distance between signal points. The results of these three stages of partition- 
ing are illustrated in Figure 8.12-1. The way in which the coded bits are mapped into 
the partitioned signal points is described below. 

example 8 . 12 - 2 . A 16 -QAM signal constellation. The 16-point rectangular signal 
constellation shown in Figure 8.12-2 is first divided into two subsets by assigning 
alternate points to each subset as illustrated in the figure. Thus, the distance between 
points is increased from 2 yf£ to 2\[2E by the first partitioning. Further partitioning of 
the two subsets leads to greater separation in Euclidean distance between signal points 
as illustrated in Figure 8.12-2. It is interesting to note that for the rectangular signal 
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FIGURE 8.12-1 

Set partitioning of an 8-PSK signal set. 


16-QAM A = 16 QAM 


*0 

0909 
•o«o 

a yo909 

•o#o 


o#o# 

oooo 
n o«o# , 
u /oooov 

D 0 i \ D 4 

0009 0900 

oooo oooo 

n otoo , A ooot , 

u ^ooooy OOOO l 1 

oooo ooot oooo otoo 
oooo oooo oooo oooo 
o#oo oooo ooot oooo 
oooo oooo oooo oooo 
0000 1000 0100 1100 



2V2 £ 


a UUUU , 

7«OiOV 

Dj / \ D, 




0 / 


1 1 


'2 

oooo 
oo#o 
. oooo , 

>19000^ 

oooo oooo oooo oooo 

OOOO 0090 OOOO 9000 
oooo oooo oooo oooo 

9000 OOOO 0090 OOOO 
0010 1010 0110 1110 


oooo 

•000 

A OOOO 1 

u /oo#^' 1 


>9oy 


9090 

oooo 
A 9090 
v oooo \ 

D x f D s 

9000 
oooo 

, OOiO , 

'^0000 y 



0 


oooo 

0909 

a OOOO 1 
°/ 0909 V 

D 3 f \ D 1 

oooo oooo 

0900 0009 

1 OOOO 1 a OOOO 1 

^OOOty V I0900\ 1 


OOOO 9000 

oooo oooo 

0090 OOOO 

oooo oooo 


OOOO 0090 

oooo oooo 

9000 OOOO 

oooo oooo 


oooo oooo oooo oooo 

OOOO 0900 OOOO 0009 
oooo oooo oooo oooo 

0009 OOOO 0900 OOOO 


0001 1001 0101 1101 0011 


1011 


0111 


1111 


FIGURE 8.12-2 

Set partitioning of 16-QAM signal. 
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FIGURE 8.12-3 

General structure of combined 
encoder/modulator. 


constellations, each level of partitioning increases the minimum Euclidean distance by 
y/2, i.e., d i+ \/di = y/2 for all i. 

In these two examples, the partitioning was carried out to the limit where each 
subset contains only a single point. In general, this may not be necessary. For example, 
the 16-point QAM signal constellation may be partitioned only twice, to yield four 
subsets of four points each. Similarly, the eight-phase PSK signal constellation can be 
partitioned twice, to yield four subsets of two points each. 

Trellis-coded modulation (TCM) The degree to which the signal is partitioned 
depends on the characteristics of the code. In general, the encoding process is performed 
as illustrated in Figure 8 . 1 2-3. A block of m information bits is separated into two groups 
of length k\ and k 2 , respectively. The k\ bits are encoded into n bits, while the k 2 bits 
are left uncoded. Then, the n bits from the encoder are used to select one of the possible 
subsets in the partitioned signal set, while the k 2 bits are used to select one of 2 kl signal 
points in each subset. When k 2 = 0, all m information bits are encoded. 

The assignment of signal subsets to state transitions in the trellis is based on three 
heuristic rules devised by Ungerboeck (1982). The rules are 

1. Use all subsets with equal frequency in the trellis. 

2. Transitions originating from the same state or merging into the same state in the 
trellis are assigned subsets that are separated by the largest Euclidean distance. 

3. Parallel state transitions (when they occur) are assigned signal points separated by 
the largest Euclidean distance. Parallel transitions in the trellis are characteristic of 
TCM that contains one or more uncoded information bits. 

example 8.12-3. Consider the use of the rate 1/2 convolutional encoder shown in 
Figure 8.12-4a to encode one information bit while the second information bit is left 
uncoded. This code results in the four-state trellis shown in Figure 8.12^-b. When 
used in conjunction with an eight-point signal constellation, such as eight-point PSK 
or QAM, the two encoded output bits are used to select one of the four subsets in the 
partitioned signal constellation, while the remaining information bit is used to select 
one of the two points within each subset. Let us use the eight-point PSK consellation 
to complete this example. The four subsets assigned to the trellis in Figure 8 . 12 — 4b 
correspond to the subsets labeled Co, C i, C 2 , C 3 in Figure 8.12-1. Note that the 
Euclidean distance of points within any subset is d 2 = 2\fE and the largest minimum 
distance between signal points in any pair of subsets is d\ = \f7£. The mappings 
of the coded bits (c 2 , cQ and the uncoded bit C 3 to the state transitions, using the 
convention (C3, C2, cQ are shown in Figure 8.12^-c. We note that each trellis state has 
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Uncoded bit 

Oc 3 




(c 3 , C 2 , c,) 



(c) Mapping of bits to state transitions 



(d) Mapping of bits (c 3 , c 2 , c{) to 
signal points corresponding to 
partition in Fig. 8.3-1 (note 
nonuniqueness of this mapping) 


FIGURE 8.12-4 

Four-state trellis-coded modulation with 8-PSK signal constellation. 


two parallel transitions, corresponding to the two possible values of the uncoded bit. The 
phase assignments in the eight-point PSK constellation are shown in Figure 8.12-4d. 
It should be noted that the mapping of the bits (C3, C2, Ci) into the eight signal points 
in the constellation is not unique. Several other mappings are possible. For example, 
an equally good mapping is obtained if the four-point subsets Bq and B\ shown in 
Figure 8.12-1, are interchanged, so that the signal points in the subsets Co, Ci, C2, and 
C3 will also change. 

In general, the number of states S = 2 V in the code trellis is a function of the number 
of memory elements in the encoder. Hence, we may increase the number of trellis states 
while maintaining the same code rate. For example. Figure 8.12-5 illustrates a rate 
2/3 code that has eight trellis states. In this case, both information bits are coded. 

Let us now evaluate the performance of the trellis-coded 8-PSK and compare its 
performance with that of uncoded 4 -PSK, which we use as a reference in measuring the 
coding gain of the trellis-coded modulation. Uncoded 4 -PSK employs the signal points 
in either subset Bo or B\ of Figure 8. 12-1 , for which the minimum distance of the signal 
points is \p2£. Note that the 4-PSK signal corresponds to a trivial one-state trellis with 
four parallel state transitions, as shown in Figure 8.12-6. The subsets Do, D2, D4, and 
D& in Figure 8.12-1 are used as the signal points for the purpose of illustration. 
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FIGURE 8.12-5 

Rate |, eight-state trellis code. 


For the trellis-coded 8-PSK modulation, we use the four-state trellis shown in 
Figure 8.12-4b and c. We observe that each branch in the trellis corresponds to one of 
the four subsets Co, C i, C2, or C3. As indicated above, for the eight-point constella- 
tion, each of the subsets Co, Cj, C2, and C3 contains two signal points. Hence, the state 
transition Co contains the two signal points corresponding to the bits (C3C2C1) = (000) 
and (100), or (0, 4) in octal representation. Similarly, C 2 contains the two signal points 
corresponding to (010) and (110) or to (2, 6) in octal, C 1 contains the points corre- 
sponding to (001) and (101) or (1 , 5) in octal, and C3 contains the points corresponding 




000 



FIGURE 8.12-6 

Uncoded 4-PSK and trellis-coded 8-PSK modulation. 
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to (Oil) and (1 1 1) or (3, 7) in octal. Thus, each transition in the four-state trellis con- 
tains two parallel paths, as previously indicated. As shown in Figure 8.12-6, any two 
signal paths that diverge from one state and remerge at the same state after more than 
one transition have a squared Euclidean distance of + 2d\ = c/ ( ] + r/f between 
them. For example, the signal paths 0, 0, 0 and 2, 1, 2 are separated by d^ + d\ = 
[(0.765) 2 + 4]£ = 4.585iS. On the other hand, the squared Euclidean distance between 
parallel transitions is d\ = 4£. Hence, the minimum Euclidean distance separation 
between paths that diverge from any state and remerge at the same state in the four- 
state trellis is A = 2 y[£. The minimum distance in the trellis code is called the free 
Euclidean distance and denoted by Df ed . 

In the four-state trellis of Figure 8.12-6b, Df e d = 2\f£. When compared with the 
Euclidean distance do = s/2£ for the uncoded 4-PSK modulation, we observe that the 
four-state trellis code gives a coding gain of 3 dB. 

We should emphasize that the four-state trellis code illustrated in Figure 8.12-6b 
is optimum in the sense that it provides the largest free Euclidean distance. Clearly, 
many other four-state trellis codes can be constructed, including the one shown in 
Figure 8.12-7, which consists of four distinct transitions from each state to all other 
states. However, neither this code nor any of the other possible four-state trellis codes 
gives a larger D fed . 

In the four-state trellis code, the parallel transitions were separated by the Euclidean 
distance 2 ~J~£, which is also Df ed . Hence, the coding gain of 3 dB is limited by the 
distance of the parallel transitions. Larger gains in performance relative to uncoded 
4-PSK can be achieved by using trellis codes with more states, which allow for the 
elimination of the parallel transitions. Thus, trellis codes with eight or more states 
would use distinct transitions to obtain a larger Df ed . 

For example, in Figure 8.12-8, we illustrate an eight-state trellis code due to 
Ungerboeck (1982) for the 8-PSK signal constellation. The state transitions for maxi- 
mizing the free Euclidean distance were determined from application of the three basic 
rules given above. In this case, note that the minimum squared Euclidean distance is 

£>f ed = dl + 2d\ = 4.585.5 

which, when compared with df = 2£ for uncoded 4-PSK, represents a gain of 
3.6 dB. Ungerboeck (1982, 1987) has also found rate 2/3 trellis codes with 16, 32, 



FIGURE 8.12-7 

An alternative four-state trellis code. 


578 


Digital Communications 


d 0 d 4 d 2 d 6 


d { d 5 d 3 d 7 


D 4 DqD 6 D 2 


D 5 DiD 7 D 3 




D 3 D 7 D\D 3 


D(,D 2 D 4 D 0 


D 7 D 3 D 5 D x 



FIGURE 8.12-8 

Eight-state trellis code for coded 
8-PSK modulation. 


64, 128, and 256 states that achieve coding gains ranging from 4 to 5.75 dB for 8-PSK 
modulation. 

The basic principle of set partitioning is easily extended to larger PSK signal 
constellations that yield greater bandwidth efficiency. For example, 3 (bits/s)/Hz can 
be achieved with either uncoded 8-PSK or with trellis-coded 16-PSK modulation. 
Ungerboeck (1987) has devised trellis codes and has evaluated the coding gains achieved 
by simple rate 1 /2 and rate 2 /3 convolutional codes for the 1 6-PSK signal constellations. 
The results are summarized below. 

Soft-decision Viterbi decoding for trellis-coded modulation is accomplished in two 
steps. Since each branch in the trellis corresponds to a signal subset, the first step in 
decoding is to determine the best signal point within each subset, i.e., the point in each 
subset that is closest in distance to the received point. We may call this subset decoding. 
In the second step, the signal point selected from each subset and its squared distance 
metric are used for the corresponding branch in the Viterbi algorithm to determine the 
signal path through the code trellis that has the minimum sum of squared distances 
from the sequence of received (noisy channel output) signals. 

The error rate performance of the trellis-coded signals in the presence of additive 
Gaussian noise can be evaluated by following the procedure described in Section 8.2 for 
convolutional codes. Recall that this procedure involves the computation of the proba- 
bility of eiTor for all different error events and summing these error event probabilities 
to obtain a union bound on the hrst-event error probability. Note, however, that at high 
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SNR, the first-event error probability is dominated by the leading term, which has the 
minimum distance Z)f e d- Consequently, at high SNR, the first-event error probability is 
well approximated as 


Pe ~ Med Q 



( 8 . 12 - 1 ) 


where Med denotes the number of signal sequences with distance D t ' e d that diverge at 
any state and remerge at that state after one or more transitions. 

In computing the coding gain achieved by trellis-coded modulation, we usually 
focus on the gain achieved by increasing D tai and neglect the effect of Med- However, 
trellis codes with a large number of states may result in a large Med that cannot be 
ignored in assessing the overall coding gain. 

In addition to the trellis-coded PSK modulations described above, powerful trellis 
codes have also been developed for PAM and QAM signal constellations. Of particular 
practical importance is the class of trellis-coded two-dimensional rectangular signal 
constellations. Figure 8.12-9 illustrates these signal constellations for M-QAM where 
M = 16, 32, 64, and 128. The M = 32 and 128 constellations have a cross pattern 
and are sometimes called cross-constellations. The underlying rectangular grid con- 
taining the signal points in M-QAM is called a lattice of type Z 2 (the subscript indicates 
the dimensionality of the space). When set partitioning is applied to this class of sig- 
nal constellations, the minimum Euclidean distance between successive partitions is 
dj + 1 /dj = s/l for all i, as previously observed in Example 8.12-2. 

Figure 8. 12-10 illustrates an eight-state trellis code that can be used with any of the 
M-QAM rectangular signal constellations for which M = 2 k , where k = 4,5.6,..., 
etc. With the eight-state trellis, we associate eight signal subsets, so that any of the 



FIGURE 8.12-9 

Rectangular two-dimensional (QAM) signal constellations. 
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FIGURE 8.12-10 

Eight-state trellis for rectangular QAM signal 
constellations. 


Af-QAM signal sets for M > 16 are suitable. For M = 2 m+1 , two input bits (k\ = 2) 
are encoded into n = 3 (n = k\ + 1 ) bits that are used to select one of the eight subsets. 
The additional ko = m —k\ input bits are used to select signal points within a subset, and 
result in parallel transitions in the eight-state trellis. Hence, 16-QAM with an 8-state 
trellis involves two parallel transitions in each branch of the trellis. More generally, the 
choice of an M = 2 m+1 -point QAM signal constellation implies that the eight-state 
trellis contains 2"' 2 parallel transitions in each branch. 

The assignment of signal subsets to transitions is based on the same set of basic 
(heuristic) rules described above for the 8-PSK signal constellation. Thus, for the 8- 
state trellis, the four (branches) transitions originating from or leading to the same state 
are assigned either the subsets D a , Do, D 4 , D ( , or D\ . Do,, D=, . Z) 7 . Parallel transitions 
are assigned signal points contained within the corresponding subsets. This eight-state 
trellis code provides a coding gain of 4 dB. The Euclidean distance of parallel transitions 
exceeds the free Euclidean distance, and, hence, the code performance is not limited 
by parallel transitions. 

Larger size trellis codes for M-QAM provide even larger coding gains. For ex- 
ample, trellis codes with 2 V states for an M = 2 m+1 QAM signal constellation can be 
constructed by convolutionally encoding k\ input bits into k\ + 1 output bits. Thus, a 
rate R c = k\ /(k\ + 1) convolutional code is employed for this purpose. Usually, the 
choice of k\ =2 provides a significant fraction of the total coding gain that is achiev- 
able. The additional ko = m — k\ input bits are uncoded and are transmitted in each 
signal interval by selecting signal points within a subset. 
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■ TABLE 8.12-1 

Coding Gains for Trellis-Coded PAM Signals 


Number 

of 

states 

ki 

Code 

rate 

ki+1 

m = 1 

coding gain (dB) 
of 4-PAM versus 
uncoded 2-PAM 

m = 2 

coding gain (dB) 
of 8-PAM versus 
uncoded 4-PAM 

m — ► oo 
asymptotic 
coding gain 
(dB) 

m — ► oo 
Afed 

4 

1 

1/2 

2.55 

3.31 

3.52 

4 

8 

1 

1/2 

3.01 

3.77 

3.97 

4 

16 

1 

1/2 

3.42 

4.18 

4.39 

8 

32 

1 

1/2 

4.15 

4.91 

5.11 

12 

64 

1 

1/2 

4.47 

5.23 

5.44 

36 

128 

1 

1/2 

5.05 

5.81 

6.02 

66 


Source: Ungerboeck (1987). 


Tables 8.12-1 to 8.12-3, taken from the paper by Ungerboeck (1987), provide a 
summary of coding gains achievable with trellis-coded modulation. Table 8.12-1 sum- 
marizes the coding gains achieved for trellis-coded (one-dimensional) PAM modulation 
with rate 1/2 trellis codes. Note that the coding gain with a 128-state trellis code is 
5.8 dB for octal PAM, which is close to the channel cutoff rate R 0 and less than 4 dB 
from the channel capacity limit for error rates in the range of 10~ 6 -1CU 8 . We should 
also observe that the number of paths A/ed with free Euclidean distance Df e d becomes 
large with an increase in the number of states. 

Table 8.12-2 lists the coding gain for trellis-coded 16-PSK. Again, we observe that 
the coding gain for eight or more trellis states exceeds 4 dB, relative to uncoded 8-PSK. 
A simple rate 1/2 code yields 5.33 dB gain with a 128-states trellis. 

Table 8.12-3 contains the coding gains obtained with trellis-coded QAM signals. 
Relatively simple rate 2/3 trellis codes yield a gain of 6 dB with 128 trellis states for 
m = 3 and 4. 

The results in these tables clearly illustrate the significant coding gains that are 
achievable with relatively simple trellis codes. A 6-dB coding gain is close to the cutoff 
rate R () for the signal sets under consideration. Additional gains that would lead to 


TABLE 8.12-2 

Coding Gains for Trellis-Coded 16-PSK Modulation 


Number 

of 

states 

*i 

Code rate 
ki + 1 

m = 3 

coding gain (dB) 
of 16-PSK versus 
uncoded 8-PSK 

m — ► oo 
Afed 

4 

1 

1/2 

3.54 

4 

8 

1 

1/2 

4.01 

4 

16 

1 

1/2 

4.44 

8 

32 

1 

1/2 

5.13 

8 

64 

1 

1/2 

5.33 

2 

128 

1 

1/2 

5.33 

2 

256 

2 

2/3 

5.51 

8 


Source: Ungerboeck (1987). 
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TABLE 8.12-3 

Coding Gains for Trellis-Coded QAM Modulation 


Number 

of 

states 


Code 

rate 

ki 

k\ + 1 

m = 3 
gain (dB) of 
16-QAM versus 
uncoded 
8-QAM 

m = 4 
gain (dB) of 
32-QAM versus 
uncoded 
16-QAM 

m — 5 
gain (dB) of 
64-QAM versus 
uncoded 
32-QAM 

m = oo 
asymptotic 
coding 
gain (dB) 

^fed 

4 

1 

1/2 

3.01 

3.01 

2.80 

3.01 

4 

8 

2 

2/3 

3.98 

3.98 

3.77 

3.98 

16 

16 

2 

2/3 

4.77 

4.77 

4.56 

4.77 

56 

32 

2 

2/3 

4.77 

4.77 

4.56 

4.77 

16 

64 

2 

2/3 

5.44 

5.44 

5.23 

5.44 

56 

128 

2 

2/3 

6.02 

6.02 

5.81 

6.02 

344 

256 

2 

2/3 

6.02 

6.02 

5.81 

6.02 

44 


Source: Ungerboeck (1987). 


transmission in the vicinity of the channel capacity bound are difficult to attain without 
a significant increase in coding/decoding complexity. Continued partitioning of large 
signal sets quickly leads to signal point separation within any subset that exceeds the 
free Euclidean distance of the code. In such cases, parallel transitions are no longer 
the limiting factor on A' e d- Usually, a partition to eight subsets is sufficient to obtain a 
coding gain of 5-6 dB with simple rate 1 /2 or rate 2/3 trellis codes with either 64 or 
128 trellis states, as indicated in Tables 8.12-1 to 8.12-3. 

Convolutional encoders for the linear trellis codes listed in Tables 8.12-1 to 8.12-3 
for the M -PAM. M- PSK, and M-QAM signal constellations are given in the papers by 
Ungerboeck (1982, 1987). The encoders may be realized either with feedback or with- 
out feedback. For example Figure 8.12-11 illustrates three feedback-free convolutional 
encoders corresponding to 4-, 8-, and 16-state trellis codes for 8-PSK and 16-QAM 
signal constellations. Equivalent realizations of these trellis codes based on system- 
atic convolutional encoders with feedback are shown in Figure 8.12-12. Usually, the 
systematic convolutional encoders are preferred in practical applications. 

A potential problem with linear trellis codes is that the modulated signal sets are not 
usually invariant to phase rotations. This poses a problem in practical applications where 
differential encoding is usually employed to avoid phase ambiguities when a receiver 
must recover the carrier phase after a temporary loss of signal. For two-dimensional 
signal constellations, it is possible to achieve 1 80° phase invariance by use of a linear 
trellis code. However, it is not possible to achieve 90° phase invariance with a linear 
code. In such a case, a non-linear code must be used. The problem of phase invari- 
ance and differential encoding/decoding was solved by Wei ( 1984a, b), who devised 
linear and non-linear trellis codes that are rotationally invariant under either 180° or 
90° phase rotations, respectively. For example, Figure 8.12-13 illustrates a non-linear 
eight-state convolutional encoder for a 32-QAM rectangular signal constellation that 
is invariant under 90° phase rotations. This trellis code has been adopted as an interna- 
tional standard (V.32 and V.33) for 9600 and 14,000 bits/s (high-speed) telephone line 
modems. 
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(a) 4-state encoder 



(b) 8-state encoder 



(c) 16-state encoder 


FIGURE 8.12-11 

Minimal feedback-free convolutional encoders for 8-PSK and 16-QAM signals. [ From 
Ungerboeck (1982). © 1982 IEEE.] 

Trellis-coded modulation schemes have also been developed for multidimensional 
signals. In practical systems, multidimensional signals are transmitted as a sequence of 
either one-dimensional (PAM) or two-dimensional (QAM) signals. Trellis codes based 
on 4-, 8-, and 16-dimensional signal constellations have been constructed, and some of 
these codes have been implemented in commercially available modems. A potential ad- 
vantage of trellis-coded multidimensional signals is that we can use smaller constituent 
two-dimensional signal constellations that allow for a trade-off between coding gain 
and implementation complexity. For example, a 16-state linear four-dimensional code, 
also designed by Wei (1987), is currently used as one of the codes for the V.34 tele- 
phone modem standard. The constituent two-dimensional signal constellation contains 
a maximum of 1664 signal points. The modem can transmit as many as 10 bits per 
symbol (eight uncoded bits) to achieve data rates as high as 33,600 bits/s. The papers 
by Wei (1987), Ungerboeck (1987), Gersho and Lawrence (1984), and Forney et al. 
(1984) treat multidimensional signal constellations for trellis-coded modulation. 


8.12-1 Lattices and Trellis Coded Modulation 

The set partitioning principles used in trellis coded modulation and the coding scheme 
based on set partitioning can be formulated in terms of lattices. We have defined lattices 
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(a) 4-state encoder 




FIGURE 8.12-12 

Equivalent realizations of systematic convolutional encoders with feedback for 8-PSK and 
16-QAM. [From Ungerboeck (1982). © 1982 IEEE.] 


and sublattice in Section 4.7. If A' is a sublattice of lattice A and c e A is arbitrary, 
we can define a shift of A' by c, denoted by A' + c as the set of points of A' when 
each is shifted by c. The result is called a coset of A' in A. If c is a member of A' 
then the coset is simply A'. The union of all distinct cosets of A' generate A, hence 
A can be partitioned into cosets where each coset is a shifted version of A'. The set 
of distinct cosets generated this way is denoted by A/ A 1 . Each element of A/ A' is a 
coset that can be represented by c e A; this element of the lattice is called the coset 
representative. The reader can compare this notion to the discussion of standard array 
and cosets in linear block codes discussed in Section 7.5 and notice the close relation. 
Coset representatives are similar to coset leaders. The set of coset representatives is 
represented by [A / A'], and the number of distinct cosets, called the order of partition, is 
denoted by | A/ A' | . From this discussion we conclude that a lattice A can be partitioned 
into cosets and be written as the union of the cosets as 

L 

A = (J{c, + A'} = [A/ A'] + A' (8.12-2) 

(=i 

where L = | A/A'| is the partition order. This relation is called the coset decomposition 
of lattice A in terms of cosets of lattice A'. 

The set partitioning of a constellation can be compared with the coset decomposi- 
tion of a lattice. Let us assume a lattice A is decomposed using sublattice A ’ such that 
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FIGURE 8.12-13 

Eight-state non-linear convolutional encoder for 32-QAM signal set that exhibits invariance 
under 90° phase rotations. 


the order of the partition | A / A'| is equal to 2", then each coset can serve as one of the 
partitions used in Ungerboeck’s set partitioning. An (n. k \ ) code is used to encode k\ 
information bits into a binary sequence of length n which select one of the 2" cosets in 
the lattice decomposition. The uncoded bits are used to select a point in the coset. 
Note that the number of elements in a coset is equal to the number of elements of the 
sublattice A' which is infinite, selection of a point in the coset determines the signal 
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FIGURE 8.12-14 

Encoder for concatenation of a PCCC (turbo code) with TCM. 

space boundary, thus determining the shaping. The total coding gain can then be defined 
as the product of two factors, the fundamental coding gain and the shaping gain. The 
shaping gain measures the amount of power reduction resulting from using a close to 
spherically shaped boundary and is independent from the convolutional code and the 
lattice used. The value of the shaping gain is limited to 1.53 dB as was discussed in 
Section 4.7. The interested reader is referred to Forney (1988). 


8.12-2 Turbo-Coded Bandwidth Efficient Modulation 

The performance of TCM can be further improved by code concatenation. There are 
several different methods described in the literature. We shall briefly describe two 
schemes for code concatenation using parallel concatenated codes, which we simply 
refer to as turbo coding. 

In one scheme, described in the paper by Le Goff et al. (1994), the information 
sequence is fed to a binary turbo encoder that employs a parallel concatenation of 
a component convolutional code with interleaving to generate a systematic binary 
turbo code. As shown in Figure 8.12-14, the output of the turbo encoder is ultimately 
connected to the signal mapper after the binary sequence from the turbo code has 
been appropriately multiplexed, the parity bit sequence has been punctured to achieve 
the desired code rate, and the data and parity sequences have been interleaved. Gray 
mapping is typically used in mapping coded bits to modulation signal points, separately 
for the in-phase (/) and quadrature ( Q) signal components. 

Figure 8.12-15 illustrates the block diagram of the decoder for this turbo coding 
scheme. Based on each received I and Q symbol, the receiver computes the loga- 
rithm of the likelihood ratio or the MAP of each systematic bit and each parity bit. 



FIGURE 8.12-15 

Decoder for concatenated PCCC/TCM code. 
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After deinterleaving, depuncturing, and demultiplexing of these logarithmic metrics, 
the systematic and parity bit information are fed to the standard binary turbo decoder. 

This scheme for constructing turbo-coded bandwidth efficient modulation imposes 
no constraints on the type or size of the signal constellation. In addition, this scheme can 
be matched to any conventional binary turbo code. In fact, this scheme is also suitable 
if the turbo code is replaced by a serially concatenated convolutional code. 

A second scheme employs a conventional Ungerboeck trellis code with interleav- 
ing to yield a parallel concatenated TCM. The basic configuration of the turbo TCM 
encoder, as described in the paper by Robertson and Worz (1998), is illustrated in Fig- 
ure 8.12-16. To avoid a rate loss, the parity sequence is punctured, as described below, 
in such a way that all information bits are transmitted only once, and the parity bits from 
the two encoders are alternately punctured. The block interleaver operates on groups 
of m — 1 information bits, where the signal constellation consists of 2'" signal points. 

To illustrate the group interleaving and puncturing, let us consider a rate R c = 
| TCM code, a block interleaver of length N = 6, and 8-PSK modulation (m = 3). 
Hence, the number of information bits per block is N(m — 1 ) = 12, and the interleaving 
is performed on pairs of information bits as shown in Figure 8.12-16 where, for example, 
a pair of bits in an even position (2, 4, 6) is mapped to another even position and a pair 
of bits in an odd position is mapped to another odd position. The output of the second 
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FIGURE 8.12-16 

Turbo TCM encoder shown for 8-PSK with two-dimensional component codes of memory 3. 
An example of interleaving with N = 6 is shown. Bold letters indicate that symbols or pairs of 
bits correspond to the upper encoder. [From Robertson and Worz (1998); © 1998 IEEE.) 


588 


Digital Communications 


TCM encoder is deinterleaved symbol-wise as illustrated in Figure 8.12-16, and the 
output symbol sequence is obtained by puncturing the two signal-point sequences, i.e., 
by selecting every other symbol from each of the two sequences. That is, we select the 
even-numbered symbols from the top symbol mapper and the odd-numbered symbols 
from the bottom symbol mapper. (In general, some of the information bits can remain 
uncoded, depending on the signal constellation and the signal mapping. In this example, 
both information bits are coded.) 

A block diagram of the turbo decoder is shown in Figure 8.12-17. In the conven- 
tional binary iterative turbo decoder, each output of each component decoder is usually 
split into three parts, namely, the systematic part, the a priori part, and the extrinsic 
part, where only the latter is passed between the two decoders. In this TCM scheme, 
the systematic part cannot be separated from the extrinsic component, because the 
noise that affects the parity component also affects the systematic component due to 
the fact that both components are transmitted by the same symbol. This implies that 
the output of the decoders can be split into only two components, namely, the a priori 
information and the extrinsic-systematic information. Hence, each decoder passes the 
extrinsic-systematic information to the other decoder. Each decoder ignores those sym- 
bols where the pertinent parity bit was not sent and obtains the systematic information 



FIGURE 8.12-17 

Turbo TCM decoder corresponding to the encoder in Figure 8.12-16. [From Robertson and 
Worz (1998); © 1998 IEEE.] 
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through its a priori input. In the first iteration, the a priori input of the first decoder 
is initialized with the missing systematic information. Details of the iterative decoder 
computations are given in the paper by Robertson and Worz (1998). An additional 
coding gain of about 1 .7 dB has been achieved by use of a turbo TCM compared to 
conventional TCM, at error rates in the vicinity of 1 0 4 . This means that turbo TCM 
achieves a performance close to the Shannon capacity on an AWGN channel. 


■ 8.13 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

In parallel with the developments on block codes are the developments in convolu- 
tional codes, which were invented by Elias (1955). The major problem in convolutional 
coding was decoding. Wozencraft and Reiffen (1961) described a sequential decoding 
algorithm for convolutional codes. This algorithm was later modified and refined by 
Fano (1963), and it is now called the Fano algorithm. Subsequently, the stack algorithm 
was devised by Zigangirov (1966) and Jelinek (1969), and the Viterbi algorithm was 
devised by Viterbi (1967). The optimality and the relatively modest complexity for 
small constraint lengths have served to make the Viterbi algorithm the most popular in 
decoding of convolutional codes with K < 10. 

One of the most important contributions in coding during the 1970s was the work of 
Ungerboeck and Csajka (1976) on coding for bandwidth-constrained channels. In this 
paper, it was demonstrated that a significant coding gain can be achieved through the 
introduction of redundancy in a bandwidth-constrained channel, and trellis codes were 
described for achieving coding gains of 3-4 dB. This work has generated much interest 
among researchers and has led to a large number of publications over the past 15 years. 
A number of references can be found in the papers by Ungerboeck (1982, 1987) and 
Forney et al. (1984). The papers by Benedetto et al. (1988, 1994) focus on applications 
and performance evaluation. Additional papers on coded modulation for bandwidth- 
constrained channels may also be found in the Special Issue on Voiceband Telephone 
Data Transmission, IEEE Journal on Selected Areas in Communication (September 
1984, August 1989, and December 1989). A comprehensive treatment of trellis-coded 
modulation is given in the book by Biglieri et al. (1991). 

A major new advance in coding and decoding is the construction of parallel and 
serially concatenated codes with interleaving, and the decoding of such codes using 
iterative MAP algorithms. Both PCCC and SCCC have been shown to yield performance 
very close to the Shannon limit with iterative decoding. PCCCs, called turbo codes, 
and the use of iterative decoding were first described in a paper by Berrou et al. (1993). 
Serially concatenated codes with interleaving and their performance have been treated 
in the paper by Benedetto et al. (1998). Turbo coding and decoding is also treated in 
the books by Heegard and Wicker (1999), Johannesson and Zigangirov (1999), and 
Schlegel (1997). Performance bounds for turbo codes are given in the paper by Duman 
and Salehi (1997) and Sason and Shamai (2001a, b). 

Low density parity check codes were introduced by the pioneering work of Gallager 
(1963). Tanner (1981) studied the relation between these codes and graphs, and the work 
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of MacKay and Neal (1996) reinstated the interest in these works. Wiberg et al. (1995), 
Wiberg (1996), and Forney (2000) extended the work of Tanner on the relation between 
codes and graphs. 

In addition to the references given above on coding, decoding, and coded signal 
design, we should mention the collection of papers published by the IEEE Press enti- 
tled Key Papers in the Development of Coding Theory, edited by Berlekamp (1974). 
This book contains important papers that were published in the first 25 years of coding 
theory. We should also cite the Special Issue on Error-Correcting Codes, IEEE Trans- 
actions on Communications (October 1971). Finally, the survey papers by Calderbank 
(1998), Costello et al. (1998), and Forney and Ungerboeck (1998) highlight the major 
developments in coding and decoding over the past 50 years and include a large number 
of references. 


PROBLEMS 


8.1 A convolutional code is described by 

= [101], ft = [111], ft = 1111] 

1. Draw the encoder corresponding to this code. 

2. Draw the state-transition diagram for this code. 

3. Draw the trellis diagram for this code. 

4. Find the transfer function and the free distance of this code. 

5. Verify whether or not this code is catastrophic. 

8.2 The convolutional code of Problem 8.1 is used for transmission over an AWGN 
channel with hard decision decoding. The output of the demodulator detector is 
(101001011 1101 1 1 • ■ ■ )■ Using the Viterbi algorithm, find the transmitted sequence, as- 
suming that the convolutional code is terminated at the zero state. 


8.3 Repeat Problem 8.1 for a code with 

ft = 1110], ft = [101], ft = [1H] 

8.4 The block diagram of a binary convolutional code is shown in Figure P8.4. 

1. Draw the state diagram for the code. 

2. Find the transfer function of the code T (Z). 

3. What is df tee , the minimum free distance of the code? 



FIGURE P8.4 
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4. Assume that a message has been encoded by this code and transmitted over a binary 
symmetric channel with an error probability of p = 10 -5 . If the received sequence is 

r = (110, 110, 110, 111,010, 101, 101) 

using the Viterbi algorithm, find the most likely information sequence, assuming that 
the convolutional code is terminated at the zero state. 

5. Find an upper bound to the bit error probability of the code when the above binary 
symmetric channel is employed. Make any reasonable approximation. 

8.5 The block diagram of a (3, 1) convolutional code is shown in Figure P8.5. 

1. Draw the state diagram of the code. 

2. Find the transfer function T (Z) of the code. 

3. Find the minimum free distance (t/f ree ) of the code, and show the corresponding path 
(at distance Jf ree from the all-zero codeword) in the trellis. 

4. Determine G{D) for this code. Use G( D) to determine whether this code is catastrophic. 

5 . Determine G( D) for the RSCC equivalent to this code, and sketch a block diagram of it. 

6. Assume that four information bits (xi , X 2 , x$, X 4 ), followed by two zero bits have been 
encoded and sent via a binary-symmetric channel with crossover probability equal to 

0.1. The received sequence is (111, 111, 111, 111, 111, 111). Use the Viterbi decoding 
algorithm to find the most likely data sequence, assuming that the convolutional code 
is terminated at the zero state. 



FIGURE P8.5 


8.6 In the convolutional code generated by the encoder shown in Figure P8.6: 

1. Find the transfer function of the code in the form T(Y, Z). 

2. Find df tee of the code. 

3. If the code is used on a channel with hard decision Viterbi decoding, assuming the 
crossover probability of the channel is p = 10~ 6 , use the hard decision bound to find 
an upper bound on the average bit error probability of the code. 



FIGURE P8.6 


8.7 Figure P8.7 depicts a rate 1/2, constraint length K = 2, convolutional code. 

1 . Sketch the tree diagram, the trellis diagram, and the state diagram. 

2. Solve for the transfer function T(Y, Z, /), and from this, specify the minimum free 
distance. 
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8.8 A rate 1/2, K = 3, binary convolutional encoder is shown in Figure P8.8. 

1. Draw the tree diagram, the trellis diagram, and the state diagram. 

2. Determine the transfer function T(Y, Z, J), and from this, specify the minimum free 
distance. 

3. Determine the RSCC equivalent to this code, and sketch a block diagram of it. 

4. Determine whether this code is catastrophic. 


8.9 A k = 1, K = 3, and n = 2 convolutional code is characterized by g { = [001] and 

g 2 = [101], 

1 . Draw the state diagram for the encoder. 

2. Determine the transfer function of the code in the form T(Y, Z). 

3. Is this code a catastrophic code? Why? 

4. Determine the free distance of the code. 

5. If the code is used with hard decision decoding on a channel with crossover probability 
of p — 10 -3 , determine an upper bound on the average bit error probability of the 
code. 

8.10 The block diagram for a convolutional code is given in Figure P8.10. 


1. Draw the state transition diagram for this code. 

2. Is this code catastrophic? Why? 

3. What is the transfer function for this code? 

4. What is the free distance of this code? 

5. Assuming that this code is used for binary data transmission over a binary symmetric 
channel with crossover probability of 10 -3 , find a bound on the resulting bit error 
probability. 

8.11 The convolutional code shown in Figure P8.10 is used with a binary antipodal signaling 
scheme for transmission over an additive noise channel with input-output relation 



Output 


FIGURE P8.8 



FIGURE P8.10 
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where c,- e {±^/£^} and noise components are iid random variables with PDF 


Pin) = ^ e w 


The receiver uses a soft decision ML decoding scheme. 

1 . Show that the optimal decoding rule is given by 



2. Find an upper bound for the average bit error probability for this system. Is this a useful 
bound? Why? 

3. Assuming that £ c = 1 and the code is terminated at the zero state, determine the most 
likely information sequence if the received output of the matched filter is 


4. If in part 3 instead of soft decision decoding, hard decision is employed, what is the 
most likely information sequence? 

5. Answer part 2 for hard decision decoding. 

8.12 The block diagram for a convolutional encoder is shown in Figure P8.12. 

1. What is the number of states for this code? 

2. Determine the transfer function T (F, Z) for this code, and find its free distance. 

3. How many paths at the free distance exist in this code? 

4. Is this code catastrophic? Why? 

5. Assuming that this code is used for transmission over a binary symmetric channel with 
a crossover probability of 10 -4 , find a bound on the bit error probability. 


8.13 For the convolutional code shown in Figure P8.12: 

1. Determine the matrix G(D). 

2. Determine the encoded sequence for the input sequence u = (100111 1001) using G(D) 
found in part 1 . 

3. Directly determine the encoded sequence corresponding to u given in part 2, and com- 
pare it with the sequence obtained using G(D). 

4. Using G(D), determine whether this code is catastrophic. 

8.14 A k = 1, K =3, and n = 2 convolutional code is characterized by g \ = [001] and 
g2 = [H0]. 

1. Find the transfer function of the code in the form T(Y, Z). 

2. Is this code catastrophic? Why? 

3. Find <?f ree for the code. 


/• = (-1, -1, 1.5, 2, 0.7, -0.5, -0.8, -3, 3, 0.2, 0, 1) 



FIGURE P8.12 
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4. If the code is used on an AWGN channel using BPSK with hard decision Viterbi 
decoding, assuming £b/No = 12.6 dB, find an upper bound on the average bit error 
probability of the code. 

8.15 Use Tables 8.3-1 to 8.3-1 1 to sketch the convolutional encoders for the following codes: 

1. Rate 1/2, K = 5, maximum free distance code 

2. Rate 1/3 , K = 5, maximum free distance code 

3. Rate 2/3, K = 2, maximum free distance code 


8.16 Draw the state diagram for the rate 2/3, K = 2, convolutional code indicated in Prob- 
lem 8.15, part 3, and, for each transition, show the output sequence and the distance of the 
output sequence from the all-zero sequence. 


8.17 Consider the K = 3, rate 1/2, convolutional code shown in Figure P8.17. Suppose that 
the code is used on a binary symmetric channel and the received sequence for the first 
eight branches is 0001 100000001001. Trace the decisions on a trellis diagram, and label 
the survivors’ Hamming distance metric at each node level. If a tie occurs in the metrics 
required for a decision, always choose the upper path (arbitrary choice). 



FIGURE P8.17 


8.18 Use the transfer function derived in Problem 8.8 for the R c = 1/2, K = 3, convolutional 
code to compute the probability of a bit error for an AWGN channel with 

a. Hard-decision decoding 

b. Soft-decision decoding 

Compare the performance by plotting the results of the computation on the same graph. 


8.19 Draw the state diagram for the convolutional code generated by the encoder shown in 
Figure P8.19, and thus determine whether the code is catastrophic. Also, give an example 
of a rate 1/2, K = 4, convolutional encoder that exhibits catastrophic error propagation. 



FIGURE P8.19 


8.20 A trellis-coded signal is formed as shown in Figure P8.20 by encoding 1 bit by use of a 
rate 1 /2 convolutional code, while 3 additional information bits are left uncoded. Perform 
the set partitioning of a 32-QAM (cross) constellation, and indicate the subsets in the 


© © © 


Chapter Eight: Trellis and Graph Based Codes 


595 


partition. By how much is the distance between adjacent signal points increased as a result 
of partitioning? 
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FIGURE P8.20 


8.21 Prove Equation 8.4^1. 

8.22 Prove that for all real numbers x, y, and z we have 

max*{jc, y } = maxjjc, y} + ln( 1 + 
max*{.r, y, z} = max* {max* {.r, y}, z} 

8.23 A recursive systematic convolutional code is characterized by 

GOD) = [1 

This code is used with antipodal signaling with £ c = ± 1 over an additive white Gaussian 
noise channel with noise power spectral density of ^ = 2 W/Hz. It is assumed that the 
convolutional code is terminated at the zero state and the received sequence is given by 

r = (0.3, 0.2, 1, -1.2, 1.21.7, 0.3 - 0.6) 

1. Use the BCJR algorithm to determine the information sequence u. 

2. Use the Viterbi algorithm to determine the information sequence u. 

8.24 Apply the Max-Log- APP algorithm to Problem 8.23, and compare the result with the result 
when the BCJR is used. 

8.25 Let Xi, 1 < i < n. denote a sequence of independent binary random variables, and let 
Pi (0) and pi( 1) denote the probabilities that X, is equal to 0 and 1, respectively. Let 

y=±* 

i=i 

where the addition is modulo-2, and denote by p( 0) and p( 1) the probabilities that Y is 0 
and 1, respectively. 

1. Show that 

n 

P(0)~ p(l) = Y[{ Pi {0)- Pi {\)) 

i=l 


596 


Digital Communications 


2. Show that 

1 1 " 

i = 1 

1 1 n 

/>(!) = 2~ 

i = 1 

3. Using these results, prove Equation 8.10-27. 


8.26 Prove Equation 8.10-31 for the equality constraint nodes. 

8.27 The parity check matrix of a (12, 3) LDPC code is given by 


H = 


'0010 
110 0 
0 0 0 1 
0 10 0 
10 10 
0 0 0 1 
10 0 1 
0 0 0 0 
0 110 


0 

1 

0 

0 

0 

1 

1 

0 

0 


1 1 1 0 0 0 0 
0 0 0 0 0 0 1 
0 0 0 1 1 1 0 
110 0 10 0 
0 0 1 0 0 10 
0 0 0 1 0 0 1 
0 1 0 0 0 0 0 
10 10 0 11 
0 0 0 1 1 0 0 


Sketch the Tanner graph for this code. 


8.28 Show that any ( n , 1) repetition code is a LDPC code. Determine the general form of the 
parity check matrix for an («, 1) repetition code. 


8.29 Sketch the Tanner graph of a (6, 1) repetition code. 



Digital Communication Through 
Band-Limited Channels 


In previous chapters, we considered the transmission of digital information through 
an additive Gaussian noise channel. In effect, no bandwidth constraint was imposed on 
the signal design and the communication system design. 

In this chapter, we consider the problem of signal design when the channel is band- 
limited to some specified bandwidth of W Hz. Under this condition, the channel may 
be modeled as a linear filter having an equivalent lowpass' frequency response C(f) 
that is zero for |/| > W . 

The first topic that is treated is the design of the signal pulse g(t) in a linearly 
modulated signal, represented as 

v(t) = - nT ) 

n 

that efficiently utilizes the total available channel bandwidth W. We shall see that when 
the channel is ideal for |/| < W, a signal pulse can be designed that allows us to 
transmit at symbol rates comparable to or exceeding the channel bandwidth W. On the 
other hand, when the channel is not ideal, signal transmission at a symbol rate equal to 
or exceeding W results in intersymbol interference (ISI) among a number of adjacent 
symbols. 

The second topic that we consider is the design of the receiver in the presence of 
intersymbol interference and AWGN. The solution to the ISI problem is to design a 
receiver that employs a means for compensating or reducing the ISI in the received 
signal. The compensator for the ISI is called an equalizer. 

We begin our discussion with a general characterization of band-limited linear filter 
channels. 


tFor convenience, the subscript on lowpass equivalent signals is omitted throughout this chapter. 
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Of the various channels available for digital communications, telephone channels are 
by far the most widely used. Such channels are characterized as band-limited linear fil- 
ters. This is certainly the proper characterization when frequency-division multiplexing 
(FDM) is used as a means for establishing channels in the telephone network. Modem 
telephone networks employ pulse-code modulation (PCM) for digitizing and encod- 
ing the analog signal and time-division multiplexing (TDM) for establishing multiple 
channels. Nevertheless, filtering is still used on the analog signal prior to sampling and 
encoding. Consequently, even though the present telephone network employs a mixture 
of FDM and TDM for transmission, the linear filter model for telephone channels is 
still appropriate. 

For our purposes, a bandlimited channel such as a telephone channel will be charac- 
terized as a linear filter having an equivalent lowpass frequency-response characteristic 
C(/). Its equivalent lowpass impulse response is denoted by c(t). Then, if a signal of 
the form 

s(t) = Re [v(t)e j2nfct ] (9.1-1) 

is transmitted over a bandpass telephone channel, the equivalent low-pass received 
signal is 

/ OO 

v(x)c(t — x) dx + z(t) (9.1-2) 

-OO 

where the integral represents the convolution of c(t) with v(t), and z(t) denotes the 
additive noise. Alternatively, the signal term can be represented in the frequency 
domain as V(/)C(/), where V (/) is the Fourier transform of v(t). 

If the channel is band-limited to W Hz, then C(/) = 0 for |/| > W. As a conse- 
quence, any frequency components in V(f) above \ f\ = W will not be passed by the 
channel. For this reason, we limit the bandwidth of the transmitted signal to W Hz also. 

Within the bandwidth of the channel, we may express the frequency response 
C(/) as 

C(f) = \C(f)\e mf) (9.1-3) 

where |C(/)| is the amplitude-response characteristic and 0(f ) is the phase -response 
characteristic. Furthermore, the envelope delay characteristic is defined as 


*(/) = 


1 d6(f) 
2: x df 


(9.1-4) 


A channel is said to be nondistorting or ideal if the amplitude response | C(/) | is constant 
for all | /| < W and 0( f ) is a linear function of frequency, i.e., x(f) is a constant for all 
|/| < W. On the other hand, if |C(/)| is not constant for all |/| < W, we say that the 
channel distorts the transmitted signal V(f) in amplitude, and, if r (/) is not constant 
for all |/| IT, we say that the channel distorts the signal V' (/) in delay . 

As a result of the amplitude and delay distortion caused by the nonideal channel 
frequency-response characteristic C(/), a succession of pulses transmitted through the 
channel at rates comparable to the bandwidth W are smeared to the point that they are 
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FIGURE 9.1-1 

Effect of channel distortion: (a) channel input; (b) channel output; (c) equalizer output. 

no longer distinguishable as well-defined pulses at the receiving terminal. Instead, they 
overlap, and, thus, we have intersymbol interference. As an example of the effect of 
delay distortion on a transmitted pulse, Figure 9.1-la illustrates a band-limited pulse 
having zeros periodically spaced in time at points labeled ±T, ±2 T, etc. If information 
is conveyed by the pulse amplitude, as in PAM, for example, then one can transmit a 
sequence of pulses, each of which has a peak at the periodic zeros of the other pulses. 
However, transmission of the pulse through a channel modeled as having a linear 
envelope delay characteristic r (/) (quadratic phase 0( f )) results in the received pulse 
shown in Figure 9.1-lb having zero-crossings that are no longer periodically spaced. 
Consequently, a sequence of successive pulses would be smeared into one another and 
the peaks of the pulses would no longer be distinguishable. Thus, the channel delay 
distortion results in intesymbol interference. As will be discussed in this chapter, it 
is possible to compensate for the nonideal frequency-response characteristic of the 
channel by use of a filter or equalizer at the demodulator. Figure 9.1-lc illustrates the 
output of a linear equalizer that compensates for the linear distortion in the channel. 

The extent of the intersymbol interference on a telephone channel can be appre- 
ciated by observing a frequency-response characteristic of the channel. Figure 9.1-2 
illustrates the measured average amplitude and delay as functions of frequency for a 
medium-range (180-725 mi) telephone channel of the switched telecommunications 
network as given by Duffy and Tratcher (1971). We observe that the usable band of 
the channel extends from about 300 Hz to about 3000 Hz. The corresponding impulse 
response of this average channel is shown in Figure 9.1-3. Its duration is about 10 ms. 
In comparison, the transmitted symbol rates on such a channel may be of the order 
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FIGURE 9.1-2 

Average amplitude and delay characteristics of medium-range telephone channel. 

of 2500 pulses or symbols per second. Hence, intersymbol interference might extend 
over 20-30 symbols. 

In addition to linear distortion, signals transmitted through telephone channels are 
subject to other impairments, specifically non-linear distortion, frequency offset, phase 
jitter, impulse noise, and thermal noise. 

Non-linear distortion in telephone channels arises from non-linearities in amplifiers 
and compandors used in the telephone system. This type of distortion is usually small 
and it is very difficult to correct. 

A small frequency offset, usually less than 5 Hz, results from the use of carrier 
equipment in the telephone channel. Such an offset cannot be tolerated in high-speed 
digital transmission systems that use synchronous phase-coherent demodulation. The 
offset is usually compensated for by the carrier recovery loop in the demodulator. 

Phase jitter is basically a low-index frequency modulation of the transmitted signal 
with the low-frequency harmonics of the power line frequency (50-60 Hz). Phase jitter 
poses a serious problem in digital transmission at high rates. However, it can be tracked 
and compensated for, to some extent, at the demodulator. 
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FIGURE 9.1-3 

Impulse response of average channel with amplitude and delay shown in Figure 9.1-2. 
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Impulse noise is an additive disturbance. It arises primarily from the switching 
equipment in the telephone system. Thermal (Gaussian) noise is also present at levels 
of 30 dB or more below the signal. 

The degree to which one must be concerned with these channel impairments de- 
pends on the transmission rate over the channel and the modulation technique. For rates 
below 1800 bits/s (R/W < 1), one can choose a modulation technique, e.g., FSK, that 
is relatively insensitive to the amount of distortion encountered on typical telephone 
channels from all the sources listed above. For rates between 1800 and 2400 bits/s 
( R/W ~ 1), a more bandwidth-efficient modulation technique such as four-phase 
PSK is usually employed. At these rates, some form of compromise equalization is 
often employed to compensate for the average amplitude and delay distortion in the 
channel. In addition, the carrier recovery method is designed to compensate for the 
frequency offset. The other channel impairments are not that serious in their effects 
on the error rate performance at these rates. At transmission rates above 2400 bits/s 
( R/W > 1), bandwidth-efficient coded modulation techniques such as trellis-coded 
QAM, PAM, and PSK are employed. For such rates, special attention must be paid to 
linear distortion, frequency offset, and phase jitter. Linear distortion is usually com- 
pensated for by means of an adaptive equalizer. Phase jitter is handled by a combi- 
nation of signal design and some type of phase compensation at the demodulator. At 
rates above 9600 bits/s, special attention must be paid not only to linear distortion, 
phase jitter, and frequency offset, but also to the other channel impairments mentioned 
above. 

Unfortunately, a channel model that encompasses all the impairments listed above 
becomes difficult to analyze. For mathematical tractability the channel model that is 
adopted in this and the next chapter is a linear filter that introduces amplitude and delay 
distortion and adds Gaussian noise. 

Besides the telephone channels, there are other physical channels that exhibit some 
form of time dispersion and, thus, introduce intersymbol interference. Radio channels 
such as shortwave ionospheric channels (HF), tropospheric scatter channels, and mobile 
radio channels are examples of time-dispersive channels. In these channels, time disper- 
sion and, hence, intersymbol interference are the result of multiple propagation paths 
with different path delays. The number of paths and the relative time delays among the 
paths vary with time, and, for this reason, these radio channels are usually called time- 
variant multipath channels. The time -variant multipath conditions give rise to a wide 
variety of frequency-response characteristics. Consequently the frequency-response 
characterization that is used for telephone channels is inappropriate for time-variant 
multipath channels. Instead, these radio channels are characterized statistically, as ex- 
plained in more detail in Chapter 13, in terms of the scattering function, which, in brief, 
is a two-dimensional representation of the average received signal power as a function 
of relative time delay and Doppler frequency. 

In this chapter, we deal exclusively with the linear time-invariant filter model for 
a band-limited channel. The adaptive equalization techniques presented in Chapter 10 
for combating intersymbol interference are also applicable to time-variant multipath 
channels, under the condition that the time variations in the channel are relatively slow in 
comparison to the total channel bandwidth or, equivalently, to the symbol transmission 
rate over the channel. 
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It was shown in Chapter 3 that the equivalent lowpass transmitted signal for several 
different types of digital modulation techniques has the common form 


v(t) = J2lng(t~nT ) (9.2-1) 

n = 0 


where {/„} represents the discrete information-bearing sequence of symbols and g(t) 
is a pulse that, for the purposes of this discussion, is assumed to have a band-limited 
frequency-response characteristic G(/), i.e., G(/) = 0 for |/| > W . This signal is 
transmitted over a channel having a frequency response C(/), also limited to | / 1 < W. 
Consequently, the received signal can be represented as 

OO 

r,(t) = W* ~nT) + z(t ) (9.2-2) 

n = 0 


where 



g(r)c(t 


x)dx 


(9.2-3) 


and z(t) represents the additive white Gaussian noise. 

Let us suppose that the received signal is passed first through a filter and then 
sampled at a rate 1 / T samples/s. We shall show in a subsequent section that the optimum 
filter from the point of view of signal detection is one matched to the received pulse. 
That is, the frequency response of the receiving filter is //*(/). We denote the output 
of the receiving filter as 


y(t) = E InX(t — nT) + v(t) 

n—0 


(9.2-4) 


where x(t ) is the pulse representing the response of the receiving filter to the input pulse 
h(t) and v(t ) is the response of the receiving filter to the noise z(t ). 

Now, if y(f) is sampled at times t = kT + To, k = 0, 1, . . . , we have 

OO 

y{kT + T 0 ) = y k = 'y^ / I„x(kT - nT + r 0 ) + v(kT + r 0 ) (9.2-5) 

77=0 


or, equivalently, 


^ ' k\Xk—n T k — 0, 1, . . . 

n = 0 


(9.2-6) 


where r 0 is the transmission delay through the channel. The sample values can be 
expressed as 

^ ^ OO 

yk = *0 h-\ E ^nXk-n | + Vk, k = 0, 1, . . . (9.2-7) 

\ 
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(a) Binary (b) Quaternary 


FIGURE 9.2-1 

Examples of eye patterns for binary and quaternary amplitude-shift keying (or PAM). 

We regard xo as an arbitrary scale factor, which we arbitrarily set equal to unity for 
convenience. Then 

OO 

yk = 4 + ^ InXk-n + V k (9.2-8) 

71=0 

n^k 

The term I k represents the desired information symbol at the &th sampling instant, the 
term 

OO 

n= 0 
n^k 

represents the ISI, and v k is the additive Gaussian noise variable at the Ath sampling 
instant. 

The amount of intersymbol interference and noise in a digital communication 
system can be viewed on an oscilloscope. For PAM signals, we can display the received 
signal y{t) on the vertical input with the horizontal sweep rate set at I / T . The resulting 
oscilloscope display is called an eye pattern because of its resemblance to the human 
eye. For example, Figure 9.2-1 illustrates the eye patterns for binary and four- level PAM 
modulation. The effect of ISI is to cause the eye to close, thereby reducing the margin 
for additive noise to cause errors. Figure 9.2-2 graphically illustrates the effect of 
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FIGURE 9.2-2 

Effect of intersymbol interference on eye opening. 
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FIGURE 9.2-3 


• • 


Two-dimensional digital “eye patterns: 


• • 


Transmitted 
eight-phase signal 
(a) 


Received signal samples 
at the ouput of demodulator 
(b) 


intersymbol interference in reducing the opening of a binary eye. Note that intersymbol 
interference distorts the position of the zero-crossings and causes a reduction in the eye 
opening. Thus, it causes the system to be more sensitive to a synchronization error. 

For PS K and QAM it is customary to display the “eye pattern” as a two-dimensional 
scatter diagram illustrating the sampled values {yQ that represent the decision variables 
at the sampling instants. Figure 9.2-3 illustrates such an eye pattern for an 8-PSK 
signal. In the absence of intersymbol interference and noise, the superimposed signals 
at the sampling instants would result in eight distinct points corresponding to the eight 
transmitted signal phases. Intersymbol interference and noise result in a deviation of 
the received samples { y*. } from the desired 8-PSK signal. The larger the intersymbol 
interference and noise, the larger the scattering of the received signal samples relative 
to the transmitted signal points. 

Below, we consider the problem of signal design under the condition that there is 
no intersymbol interference at the sampling instants. 


9.2-1 Design of Band-Limited Signals for No Intersymbol 
Interference — The Nyquist Criterion 

For the discussion in this section and in Section 9.2-2, we assume that the band-limited 
channel has ideal frequency-response characteristics, i.e., C(/) = lfor|/| < IT. Then 
the pulse x(t) has a spectral characteristic X(f ) = |G(/)| 2 , where 


We are interested in determining the spectral properties of the pulse x{t) and, hence, 
the transmitted pulse g(t), that results in no intersymbol interference. Since 



(9.2-9) 


OO 



(9.2-10) 


i=0 


the condition for no intersymbol interference is 



k = 0 

k^O 


(9.2-11) 
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Below, we derive the necessary and sufficient condition on X(f) in order for x(t) 
to satisfy the above relation. This condition is known as the Nyquist pulse-shaping 
criterion or Nyquist condition for zero ISI and is stated in the following theorem. 

theorem: (NYQUIST). The necessary and sufficient condition for x(t) to satisfy 
is that its Fourier transform X(f) satisfy 

OO 

X(f + m./T) = T (9.2-13) 


Proof. In general, x(t) is the inverse Fourier transform of AT/). Hence, 

/ OO 

X( f )e^ ft df 

-OO 

At the sampling instants t = nT, this relation becomes 

/ OO 

X(f)e jl7rfnT df 

-OO 


(9.2-14) 


(9.2-15) 


Let us break up the integral in Equation 9.2-15 into integrals covering the finite range 
of 1/T. Thus, we obtain 


((nT) = Y 


00 r (2m+l)/2T 


m=—oo J(2m — \)/2T 
oo rl/2T 

= E mi 

m = — oo J 
r \/2 T 


X(f)e j2nfnT df 
X(f +m/T)e j2nfnT df 


J-1/2T 

rl/2T 

J-1/2T 

where we have defined B(f) as 


X(f + m/T) 


e j2nfnT df 


B(f)e 


J2ltfnT 


df 


B(f)= ]T Xif + m/T) 


(9.2-16) 


(9.2-17) 


Obviously B(f) is a periodic function with period 1 / 7’ , and, therefore, it can be 
expanded in terms of its Fourier series coefficients { b n } as 


where 


B(f)= Y, b " eJ 


j2nnfT 


N/2T 

b„ = T B{f)e~ j2nnfT df 

J-1/2T 


(9.2-18) 


(9.2-19) 
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FIGURE 9.2-4 

Plot of B(f) for the case T < 1/2 VP. 


Comparing Equations 9.2-19 and 9.2-16, we obtain 

b„ = Tx(-nT ) (9.2-20) 


Therefore, the necessary and sufficient condition for Equation 9.2-1 1 to be satisfied is 
that 


( T n = 0 

b n = { 

1 0 n ± 0 

which, when substituted into Equation 9.2-18, yields 


B(f) = T 


or, equivalently, 

OO 

X(f + m/T) = T 

m=—o o 


(9.2-21) 


(9.2-22) 


(9.2-23) 


This concludes the proof of the theorem. 

Now suppose that the channel has a bandwidth of W . Then C(/) = 0 for [ / > W 
and, consequently, X( f ) = 0 for \ f\ > W. We distinguish three cases. 

1. When T < 1/2W, or, equivalently, \/T > 2W, since B(f) = J2t=-oc X(f+n/T) 
consists of nonoverlapping replicas of X(f), separated by 1/7 as shown in Fig- 
ure 9.2 — 4, there is no choice for X( f ) to ensure B(f) = T in this case and there is 
no way that we can design a system with no 1ST 

2. When T = 1/2 W, or, equivalently, l/T = 2W (the Nyquist rate), the replications 
of X{ f), separated by l/T, are as shown in Figure 9.2-5. It is clear that in this case 


2*(/+n/r) 


1 

T 


W = 


2 T 


1 

T 


FIGURE 9.2-5 

Plot of B(f) for the case T = 1/2 VP. 
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there exists only one X(f ) that results in B(f) = T, namely, 


X(f) = 



I/I < W 

otherwise 


(9.2-24) 


which corresponds to the pulse 


sm(jtt/T) 

x{t) = = sine 

7 xt/T 


(9.2-25) 


This means that the smallest value of T for which transmission with zero ISI is 
possible is T = I /21V, and for this value, x(t) has to be a sine function. The 
difficulty with this choice of x(t) is that it is noncausal and, therefore, nonrealizable. 
To make it realizable, usually a delayed version of it, i.e., sinc[jr(t — to)/ T | is used 
and to is chosen such that for t < 0, we have sincffiL — to ) / 7 ] ~ 0. Of course, with 
this choice of x(t), the sampling time must also be shifted to ml' + r 0 - A second 
difficulty with this pulse shape is that its rate of convergence to zero is slow. The 
tails of x(t) decay as 1/f; consequently, a small mistiming error in sampling the 
output of the matched filter at the demodulator results in an infinite series of ISI 
components. Such a series is not absolutely summable because of the 1/f rate of 
decay of the pulse, and, hence, the sum of the resulting ISI does not converge. 

3. When T > 1/2 IT, B(f) consists of overlapping replications of X(f) separated by 
1/ T, as shown in Figure 9.2-6. In this case, there exist numerous choices for X( f ) 
such that B( f) = T. 


A particular pulse spectrum, for the T > 1/2 IT case, that has desirable spectral 
properties and has been widely used in practice is the raised cosine spectrum. The raised 
cosine frequency characteristic is given as (see Problem 9.16) 


T 


Xrc(f) = 


T 

2 


1 1 + cos 


'jzT 

.T 




o 


0< l/l < 


1 -/ 
2 T 


1 -/ 
2 T 


< I/I < 


1 +/ 

2 T 


I/I > 


1 +/ 

2 T 


(9.2-26) 


where /? is called the roll-off factor and takes values in the range 0 < f < 1. The 
bandwidth occupied by the signal beyond the Nyquist frequency 1/27" is called the 


2 X(/+n/T) 


IDTT 


\ 

T 


~W_\ 

T 


+ W 


\ 

T 


FIGURE 9.2-6 

Plot of B(f ) for the case T > 1/2 W. 
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(b) 


FIGURE 9.2-7 

Pulses having a raised cosine spectrum. 


excess bandwidth and is usually expressed as a percentage of the Nyquist frequency. 
For example, when p = 5 , the excess bandwidth is 50 percent and when p = 1, the 
excess bandwidth is 100 percent. The pulse x(t), having the raised cosine spectrum, is 


sin(jtt/T ) cosiirfit /T) 
nt/T 1-4 p 2 t 2 /T 2 


= sinc(jTt/T ) 


cos (jtpt/T) 

1 -4p 2 t 2 /T 2 


(9.2-27) 


Note that x{t) is normalized so that x(0) = 1. Figure 9.2-7 illustrates the raised cosine 
spectral characteristics and the corresponding pulses for p = 0, and 1. Note that 
for p = 0, the pulse reduces to x(t) = sinc( 7 rr/r), and the symbol rate l/T = 2 W. 
When p = 1 , the symbol rate is 1 / T = W . In general, the tails of x(t) decay as 1/t 3 for 
P > 0. Consequently, a mistiming error in sampling leads to a series of ISI components 
that converges to a finite value. 

Because of the smooth characteristics of the raised cosine spectrum, it is possible 
to design practical biters for the transmitter and the receiver that approximate the 
overall desired frequency response. In the special case where the channel is ideal, i.e., 
C(f) = 1, I/I < W, we have 


x rc (f) = G T (f)G R (f) 


(9.2-28) 


where Gj(f ) and G R (f) are the frequency responses of the two biters. In this case, if 
the receiver biter is matched to the transmitter biter, we have X rc (f) = G j( f)G R ( f ) = 

| G r (/)| 2 . Ideally, 


Gr(f) = n/| X rc (f)\e~ j27lf, ° 


(9.2-29) 
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and Gr( f) = G* r (f), where to is some nominal delay that is required to ensure physical 
realizability of the filter. Thus, the overall raised cosine spectral characteristic is split 
evenly between the transmitting filter and the receiving filter. Note also that an additional 
delay is necessary to ensure the physical realizability of the receiving filter. 


9.2-2 Design of Band-Limited Signals with Controlled 
ISI — Partial-Response Signals 


As we have observed from our discussion of signal design for zero ISI, it is necessary to 
reduce the symbol rate 1/7" below the Nyquist rate of 2 IT symbols/s to realize practical 
transmitting and receiving filters. On the other hand, suppose we choose to relax the 
condition of zero ISI and, thus, achieve a symbol transmission rate of 2 IT symbols/s. 
By allowing for a controlled amount of ISI, we can achieve this symbol rate. 

We have already seen that the condition for zero ISI is x(n T) = 0 for n yt 0. 
However, suppose that we design the band-limited signal to have controlled ISI at one 
time instant. This means that we allow one additional nonzero value in the samples 
{x(nT)}. The ISI that we introduce is deterministic or “controlled” and, hence, it can 
be taken into account at the receiver, as discussed below. 

One special case that leads to (approximately) physically realizable transmitting 
and receiving biters is specibed by the samples^ 

1 n = 0, 1 

0 otherwise 

Now, using Equation 9.2-20, we obtain 

T n = 0, -1 

0 otherwise 

which, when substituted into Equation 9.2-18, yields 

B(f) = T + T e~ J2nfT 




(9.2-30) 


(9.2-31) 


(9.2-32) 


As in the preceding section, it is impossible to satisfy the above equation for T < 1/2 IT. 
However, for T =1 /2W, we obtain 


X(f) = 



„-jnf/W 


) 


— e ~J n f/ 2W COS 

2W 


I/I < W 

otherwise 

I/I < W 
otherwise 


(9.2-33) 


tit is convenient to deal with samples of x(t) that are normalized to unity for n = 0,1. 
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FIGURE 9.2-8 

Time-domain and frequency-domain characteristics of a duobinary signal. 


Therefore, x{t ) is given by 


x{t) = sinc(27 rWt) + sine 


2n 



(9.2-34) 


This pulse is called a duobinary signal pulse . It is illustrated along with its magnitude 
spectrum in Figure 9.2-8. Note that the spectrum decays to zero smoothly, which means 
that physically realizable biters can be designed that approximate this spectrum very 
closely. Thus, a symbol rate of 2W is achieved. 

Another special case that leads to (approximately) physically realizable transmit- 
ting and receiving biters is specibed by the samples 


/ , . 1 n — 1 

( ) = x(nT ) = < — 1 n = 1 

sIW ) I 0 otherwise 


(9.2-35) 


The corresponding pulse x{t) is given as 


. nit + T) . n{t - T ) 
x(t) = sine sine 


(9.2-36) 


and its spectrum is 


1 ' Jxf/W _ ,-Jxf/W) = J_ sin |y| < W 


! " p 

X(f)={ 2ir 


VT W 


I/I > W 


(9.2-37) 


This pulse and its magnitude spectrum are illustrated in Figure 9.2-9. It is called a 
modified duobinary signal pulse. It is interesting to note that the spectrum of this signal 
has a zero at / = 0, making it suitable for transmission over a channel that does not 
pass DC. 

One can obtain other interesting and physically realizable biter characteristics, as 
shown by Kretzmer (1966) and Lucky et al. (1968), by selecting different values for 
the samples [x{n/2W)} and more than two nonzero samples. However, as we select 
more nonzero samples, the problem of unraveling the controlled ISI becomes more 
cumbersome and impractical. 
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FIGURE 9.2-9 

Time-domain and frequency-domain characteristics of a modified duobinary signal. 


In general, the class of band-limited signal pulses that have the form 


x(t) = 


n =— oo 


( n \ 

( n Y 

sine 

2itW it 

\2W) 

L V 2Wj\ 


and their corresponding spectra 


X(f) = 


1 

2 W 
0 





g-jnnf/W 


I/I < W 

I/I > W 


(9.2-38) 


(9.2-39) 


are called partial-response signals when controlled ISI is a purposely introduced by 
selecting two or more nonzero samples from the set {x(n/2W)}. The resulting signal 
pulses allow us to transmit information symbols at the Nyquist rate of 2 W symbols/s. 
The detection of the received symbols in the presence of controlled ISI is described 
below. 


Alternative characterization of partial-response signals We conclude this sub- 
section by presenting another interpretation of a partial-response signal. Suppose that 
the partial-response signal is generated, as shown in Figure 9.2-10, by passing the 
discrete-time sequence {/„} through a discrete-time filter with coefficients x n = 
x(n/2W), n = 0, 1, . . . , N — 1, and using the output sequence {B n } from this filter 
to excite periodically with an input B n S(t — nT) an analog filter having an impulse 
response sinc(2jr W t). The resulting output signal is identical to the partial-response 
signal given by Equation 9.2-38. 

Since 


N - 1 

^ ^ Xk l n -k 
k = 0 


(9.2-40) 
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-wow 


Output 


FIGURE 9.2-10 

An alternative method for generating a partial-response signal. 


the sequence of symbols { B n } is correlated as a consequence of the filtering performed 
on the sequence {/„}. In fact, the autocorrelation function of the sequence { B n ) is 


R(m) = E(B n B„ +m ) 

N— 1 N - 1 

— / ^ ^ ' -t /. -t / -k Ill -ill /I 
k=0 1=0 

When the input sequence is zero-mean and white, 


(9.2-41) 


E(I„-kln+m-l ) = Sm+k-l (9.2-42) 

where we have used the normalization £(/^) = 1. Substitution of Equation 9.2— 4-2, 
into Equation 9.2-41 yields the desired autocorrelation function for {B n } in the form 

N-\-\m\ 

R(m)= ^2 x k*k+\m\ , m = 0, ±1, . . . , ±(N - 1) (9.2-43) 

k = 0 


The corresponding power spectral density is 


N—\ 


S(f) = R(m)e- i2 ” fmT 


m=—(N—l) 
N— 1 


E 

m = 0 


X,n d 


-jlnfmT 


(9.2-44) 


where T = 1/2 W and |/| < 1/27 = W. Thus, the partial-response signal designs 
provide spectral shaping of the signal transmitted through the channel. 
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9.2-3 Data Detection for Controlled ISI 

In this section, we describe two methods for detecting the information symbols at the 
receiver when the received signal contains controlled ISI. One is a symbol-by-symbol 
detection method that is relatively easy to implement. The second method is based 
on the maximum-likelihood criterion for detecting a sequence of symbols. The latter 
method minimizes the probability of error but is a little more complex to implement. 
In particular, we consider the detection of the duobinary and the modified duobinary 
partial-response signals. In both cases, we assume that the desired spectral character- 
istic X(f) for the partial-response signal is split evenly between the transmitting and 
receiving filters, i.e., | Gr(f)\ = \Gr(/)\ = |X(/)| 1,/2 . This treatment is based on PAM 
signals, but it is easily generalized to QAM and PSK. 

Symbol-by-symbol suboptimum detection For the duobinary signal pulse, 
x{nT) = 1, for n = 0, 1, and is zero otherwise. Hence, the samples at the output 
of the receiving filter (demodulator) have the form 

y m — Pm T r 1 ,,; — l m T Ini— i T (9.2 45) 

where {/„,} is the transmitted sequence of amplitudes and {v m } is a sequence of additive 
Gaussian noise samples. Let us ignore the noise for the moment and consider the binary 
case where I m = ±1 with equal probability. Then B m takes on one of three possible 
values, namely, B m = —2,0,2 with corresponding probabilities 1/4, 1/2, 1/4. If 
7,„_i is the detected symbol from the (m — l)th signaling interval, its effect on B m , 
the received signal in the 777 th signaling interval, can be eliminated by subtraction, thus 
allowing /,„ to be detected. This process can be repeated sequentially for every received 
symbol. 

The major problem with this procedure is that errors arising from the additive noise 
tend to propagate. For example, if 7 m _i is in error, its effect on B m is not eliminated 
but, in fact, is reinforced by the incorrect subtraction. Consequently, the detection of 
/,„ is also likely to be in error. 

Error propagation can be avoided by precoding the data at the transmitter instead of 
eliminating the controlled ISI by subtraction at the receiver. The precoding is performed 
on the binary data sequence prior to modulation. From the data sequence {D,,} of Is 
and Os that is to be transmitted, a new sequence {P,,}, called the precoded sequence , is 
generated. For the duobinary signal, the precoded sequence is defined as 

Pm = D,n © P m - 1 , m = 1,2,... (9.2-46) 

where © denotes modulo-2 subtraction/ Then we set = — 1 if P m = 0 and /„, = 1 
if p m = 1, i.e., /,„ = 2 P m — 1. Note that this precoding operation is identical to that 
described in Section 3.3 in the context of our discussion of an NRZI signal. 


tAlthough this is identical to modulo-2 addition, it is convenient to view the precoding operation for 
duobinary in terms of modulo-2 subtraction. 
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The noise-free samples at the output of the receiving filter are given by 
Pm — l m + — 1 

= (2 P m - 1) + (2P m -\ - 1) (9.2-47) 

= 2 (P m + P m - 1 — 1 ) 

Consequently, 

Pm + Pm- 1 = + 1 (9.2-48) 

Since D m = P m © P m -\, it follows that the data sequence D m is obtained from B m 
using the relation 

D m = \B m + 1 (mod 2) (9.2-49) 

Consequently, if B m = ±2, then D,„ = 0, and if B,„ = 0, then D m = 1 . An example 
that illustrates the precoding and decoding operations is given in Table 9.2-1. In the 
presence of additive noise, the sampled outputs from the receiving filter are given by 
Equation 9.2-45. In this case y m = B m + v,„ is compared with the two thresholds set 
at + 1 and — 1 . The data sequence { D „ } is obtained according to the detection rule 


D m 


1 (I y m \ < 1) 

0 (\y m \ > 1 ) 


(9.2-50) 


The extension from binary PAM to multilevel PAM signaling using the duobinary 
pulses is straightforward. In this case the M - level amplitude sequence {/,„} results in a 
(noise-free) sequence 


B m = I m + Im- 1, m = 1,2,... (9.2-51) 

which has 2 M — 1 possible equally spaced levels. The amplitude levels are determined 
from the relation 


I m = 2 P m ~ (M - 1) 


(9.2-52) 


■ TABLE 9.2-1 

Binary Signaling with Duobinary Pulses 


Data 

sequence D„ 

1 

1 

1 

0 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

Precoded 
sequence P„ 

0 1 

0 

1 

1 

0 

0 

0 

1 

1 

1 

1 

0 

1 

1 

0 

Transmitted 
sequence /,, 

-1 1 

-1 

1 

1 

-1 

-1 

-1 

1 

1 

1 

1 

-1 

1 

1 

-1 

Received 
sequence B„ 

0 

0 

0 

2 

0 

-2 

-2 

0 

2 

2 

2 

0 

0 

2 

0 

Decoded 
sequence D„ 

1 

1 

1 

0 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 
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where {P m } is the precoded sequence that is obtained from an M-level data sequence 
{D,„} according to the relation 

P m = D m © P m _ , (mod M) (9.2-53) 

where the possible values of the sequence { D m } are 0, 1, 2, . . . , M — 1. 

In the absence of noise, the samples at the output of the receiving filter may be 
expressed as 


B m = I m + Im- 1 = 2 [P,„ + P m - X — (Af — 1)] (9.2-54) 

Hence, 

P m + Pin — i = \B m + (M — 1) (9.2-55) 

Since D m = P m + P m -\ (mod M), it follows that 

D m = \ B m + (M - 1) (mod M) (9.2-56) 

An example illustrating multilevel precoding and decoding is given in Table 9.2-2. 

In the presence of noise, the received signal-plus-noise is quantized to the nearest 
of the possible signal levels and the rule given above is used on the quantized values to 
recover the data sequence. 

In the case of the modified duobinary pulse, the controlled ISI is specified by the 
values x(n/2W ) = — 1, for n = 1, x(n/2W) = 1 for n = —1, and zero otherwise. 
Consequently, the noise-free sampled output from the receiving filter is given as 

B m = I m - I m — 2 (9.2-57) 

where the M-level sequence {/„,} is obtained by mapping a precoded sequence accord- 
ing to the Equation 9.2-52 and 

Pm = Dm © Pm- 2 (mod M) (9.2-58) 


TABLE 9.2-2 

Four-Level Signal Transmission with Duobinary Pulses 


Data 

sequence D m 
Precoded 
sequence P,„ 
Transmitted 
sequence 
Received 
sequence B„ 
Decoded 

sequence D m 


0 0 
0 0 0 

-3 -3 -3 

-6 -6 
0 0 


13 12 

12 3 3 

-113 3 

-4 0 4 6 

13 12 


0 3 3 

1 2 1 

-1 1 -1 

2 0 0 

0 3 3 


2 0 10 

13 2 2 

-13 11 

-2 2 4 2 

2 0 10 
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FIGURE 9.2-11 

Block diagram of modulator and demodulator for partial-response signals. 


From these relations, it is easy to show that the detection rule for recovering the data 
sequence { D m \ from {B,,,} in the absence of noise is 

D m = \B m (mod M) (9.2-59) 

As demonstrated above, the precoding of the data at the transmitter makes it possible 
to detect the received data on a symbol-by-symbol basis without having to look back 
at previously detected symbols. Thus, error propagation is avoided. 

The symbol-by-symbol detection rule described above is not the optimum detection 
scheme for partial-response signals due to the memory inherent in the received signal. 
Nevertheless, symbol-by-symbol detection is relatively simple to implement and is 
used in many practical applications involving duobinary and modified duobinary pulse 
signals. 

Let us determine the probability of error for detection of digital M - ary PAM sig- 
naling using duobinary and modified duobinary pulses. The channel is assumed to be 
an ideal band-limited channel with additive white Gaussian noise. The model for the 
communication system is shown in Figure 9.2-1 1. 

At the transmitter, the /U- level data sequence {£)„,} is precoded as described pre- 
viously. The precoder output is mapped into one of M possible amplitude levels. Then 
the transmitting filter with frequency response Gj(f) has an output 

OO 

v(t)= J2 IngAt-nT) (9.2-60) 

n = — oo 

The partial-respone function X(f) is divided equally between the transmitting and 
receiving filters. Flence, the receiving filter is matched to the transmitted pulse, and the 
cascade of the two filters results in the frequency characteristic 

\GAf)G R (f)\ = \X(f)\ (9.2-61) 

The matched filter output is sampled at t = nT = n/2W and the samples are fed to 
the decoder. For the duobinary signal, the output of the matched filter at the sampling 
instant may be expressed as 

Vni = f m T I m — ! T ri/7 — B m T v m (9.2—62) 

where v m is the additive noise component. Similarly, the output of the matched filter 
for the modified duobinary signal is 


y m — Im Im—2 T rizz — B m T V m 


(9.2-63) 
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For binary transmission, let = ±d, where 2d is the distance between signal levels. 
Then, the corresponding values of B m are (2d, 0, —2d). For M- ary PAM signal trans- 
mission, where /,„ = ±d, ±3 d , . . . , ±(M — I )d, the received signal levels are B m = 0, 
±2 d, ±4 d , . . . , ±2 (M — I )d. Hence, the number of received levels is 2M — 1, and the 
scale factor d is equivalent to xo = £ g . 

The input transmitted symbols {/,„} are assumed to be equally probable. Then, for 
duobinary and modified duobinary signals, it is easily demonstrated that, in the absence 
of noise, the received output levels have a (triangular) probability distribution of the 
form 


P(B = 2md)= — — j'” 1 , m = 0, ±1, ±2, . . . , ±{M — 1) (9.2-64) 

M 2 

where B denotes the noise-free received level and 2d is the distance between any two 
adjacent received signal levels. 

The channel corrupts the signal transmitted through it by the addition of white 
Gaussian noise with zero-mean and power spectral density ^Nq. 

We assume that a symbol error occurs whenever the magnitude of the additive 
noise exceeds the distance d. This assumption neglects the rare event that a large noise 
component with magnitude exceeding d may result in a received signal level that yields a 
correct symbol decision. The noise component v m is zero-mean Gaussian with variance 

rW 

a 2 = iiVo / \G R (f)\ 2 df 

w (9.2-65) 

, [ w 2 No 

= \No / I X(f)\df = 

J-W Tt 

for both the duobinary and the modified duobinary signals. Hence, an upper bound on 
the symbol probability of error is 


M—2 

P e < PGy — 2md\ > d\B = 2md)P(B = 2 md) 

m=-(M- 2) 

+ 2 P[y + 2(M - l)d > d\B = -2 (M - 2 )d]P[B = -2 (M - 1 )d] 

{ M-l 

2 J2 P(B = 2 md) - P(B = 0) - P[B = -2(M - 1 )d] 

m = 0 , 


= (1 - M~ 2 )P(\y\ > d\B = 0) 


(9.2-66) 


But 


P(\y\ >d\B=0) = 


\[2jzo 


- xl ' 2a 'dx 



= 2 Q 


(9.2-67) 
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Therefore, the average probability of a symbol error is upper-bounded as 


P e < 2(1 



(9.2-68) 


The scale factor d in Equation 9.2-68 can be eliminated by expressing it in terms 
of the average power transmitted into the channel. For the M- ary PAM signal in which 
the transmitted levels are equally probable, the average power at the output of the 
transmitting biter is 


E(J 2 \ r w E(f 2 ) r w 4 

Pav = / \GAf)\ 2 df = / | X(f)\df = — E(I l) 

1 J-w / J-W 7T 1 

where E[l 2 n ) is the mean square value of the M signal levels, which is 

E(I 2 ) = \d\M 2 - 1 ) 

Therefore, 


3jtP m T 
4 (M 2 - 1) 


(9.2-69) 


(9.2-70) 


(9.2-71) 


By substituting the value of d 2 from Equation 9.2-7 1 into Equation 9.2-68, we obtain 
the upper bound on the symbol error probability as 


Pe <2 



Q 



n \ 2 6 £av 

4 J M 2 - 1 An 


(9.2-72) 


where £ av is the average energy per transmitted symbol, which can be also expressed 
in terms of the average bit energy as £ av = kEt , av = (log 2 M)Sb aY . 

The expression in Equation 9.2-72 for the probability of error of M- ary PAM holds 
for both duobinary and modibed duobinary partial-response signals. If we compare this 
result with the error probability of M- ary PAM with zero ISI, which can be obtained 
by using a signal pulse with a raised cosine spectrum, we note that the performance of 
partial-response duobinary or modibed duobinary has a loss of {\tt) 2 , or 2.1 dB. This 
loss in SNR is due to the fact that the detector for the partial-response signals makes 
decisions on a symbol-by-symbol basis, and ignores the inherent memory contained in 
the received signal at its input. 


Maximum-likelihood sequence detection It is clear from the above discussion 
that partial-response waveforms are signal waveforms with memory. This memory is 
conveniently represented by a trellis. For example, the trellis for the duobinary partial- 
response signal for binary data transmission is illustrated in Figure 9.2-12. For binary 
modulation, this trellis contains two states, corresponding to the two possible input 
values of /,„, i.e., = ±1. Each branch in the trellis is labeled by two numbers. The 

brst number on the left is the new data bit, i.e., l m+ \ = ±1. This number determines 
the transition to the new state. The number on the right is the received signal level. 
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1/2 1/2 1/2 



FIGURE 9.2-12 

Trellis for duobinary partial-response signal. 


The duobinary signal has a memory of length L = 1 . Hence, for binary modulation 
the trellis has S, = 2 states. In general, for M - ary modulation, the number of trellis 
states is M h . 

The optimum maximum-likelihood (ML) sequence detector selects the most prob- 
able path through the trellis upon observing the received data sequence {y m } at the 
sampling instants t = mT,m = 1, 2, .... In general, each node in the trellis will have 
M incoming paths and M corresponding metrics. One out of the M incoming paths is 
selected as the most probable, based on the values of the metrics and the other M — 1 
paths and their metrics are discarded. The surviving path at each node is then extended 
to M new paths, one for each of the M possible input symbols, and the search process 
continues. This is basically the Viterbi algorithm for performing the trellis search. Its 
performance is calculated in Section 9.3-4. 


9.2 — 4 Signal Design for Channels with Distortion 

In Sections 9.2-1 and 9.2-2, we described signal design criteria for the modulation filter 
at the transmitter and the demodulation filter at the receiver when the channel is ideal. In 
this section, we perform the signal design under the condition that the channel distorts 
the transmitted signal. We assume that the channel frequency-response C(/) is known 
for I/I < W and that C(/) = 0 for |/| > W. The filter responses Gr(f) and G R (f) 
may be selected to minimize the error probability at the detector. The additive channel 
noise is assumed to be Gaussian with power spectral density S, m (f )• Figure 9.2-13 
illustrates the overall system under consideration. 

For the signal component at the output of the demodulator, we must satisfy the 
condition 


G T (f)C(f)G R (f) = X d (f)e~ j2 « fl \ |/| < W (9.2-73) 



Gaussian 


noise 


FIGURE 9.2-13 

System model for the design of the modulation and demodulation filters. 
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where X,/(f) is the desired frequency response of the cascade of the modulator, channel, 
and demodulator, and to is a time delay that is necessary to ensure the physical real- 
izability of the modulation and demodulation filters. The desired frequency response 
X d {f) may be selected to yield either zero ISI or controlled ISI at the sampling instants. 
We shall consider the case of zero ISI by selecting Xd(f ) = X rc (f), where X rc ( f ) is 
the raised cosine spectrum with an arbitrary roll-off factor. 

The noise at the output of the demodulation filter may be expressed as 

/ OO 

n(t - x)g R {x)dx (9.2-74) 

-OO 

where n(t ) is the input to the filter. Since n(t) is zero-mean Gaussian, v(t) is zero-mean 
Gaussian, with a power spectral density 

S vv (f) = S nn (f)\G R (f)\ 2 (9.2-75) 

For simplicity, we consider binary PAM transmission. Then, the sampled output 
of the matched filter is 


y m — A'O hn T V m — Ini T Dh (9.2 76) 

where xq is normalized^ to unity, /„, = ±d, and v m represents the noise term, which is 
zero-mean Gaussian with variance 

/ OO 

S im (f)\G R (f)\ 2 df (9.2-77) 

-OO 


Consequently, the probability of error is 


Pi 


1 

-n/ 27 r 




(9.2-78) 


The probability of error is minimized by maximizing the ratio d 2 /a 2 or, equiva- 
lently, by minimizing the noise-to-signal ratio a 2 / d 2 . 

Let us consider two possible solutions for the case in which the additive Gaussian 
noise is white, so that S nn (f ) = No/2. First, suppose that we precompensate for the 
total channel distortion at the transmitter, so that the filter at the receiver is matched to 
the received signal. In this case, the transmitter and receiver biters have the magnitude 
characteristics 


\GAf)\ 

\G R (f)\ 


yxM) 

|C(/)| ’ 

ypGlf), 


I/I < W 
I/I < W 


(9.2-79) 


The phase characteristic of the channel frequency response C(/) may also be com- 
pensated at the transmitter biter. For these biter characteristics, the average transmitted 


tBy setting Xq = 1 and I,„ = ±d, the scaling by Xq is incorporated into the parameter d. 
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power is 


p — 

*■ av — 


E(C) 


d 2 r w 

gj(t) dt = ~ 

) ' J — 


\G T (f)\ 2 df 


w 


cf f w X TC jf) 

T J- W |C(/)| 2 


df 


and, hence, 


d 1 = P m T 


f w X TC jf) 
I —w |C(/)| 2 


-i — 1 


df 


(9.2-80) 


(9.2-81) 


The noise variance at the output of the receiver filter is a 2 = No/2 and, hence, the 
SNR at the detector is 


d 2 


2P m T r r w X rc (f) 
No [ J-w [C(/)| 2 


df 


(9.2-82) 


As an alternative, suppose we split the channel compensation equally between the 
transmitter and receiver filters, i.e., 


|Gr(/)| 

|G*(/)| 


yxfXD 

|C(/)| 1/2 ’ 

yxjj) 

I C(Z )! 1 / 2 


I/I < W 
I/I < W 


(9.2-83) 


The phase characteristic of C(f) may also be split equally between the transmitter 
and receiver filters. In this case, the average transmitter power is 


r -W 


Pm, = ~ 


X rc (f) 


df 


T J-w |C(/)| 
and the noise variance at the output of the receiver filter is 


CT, 2 = 


No 


r-W 


Xrcif) 


Hence, the SNR at the detector is 


d 2 

cr, 2 


2P nv T 


2 J-w |C(/)| 

W X rc {f) 


df 


No [J-w |C(/)| 


df 


(9.2-84) 


(9.2-85) 


(9.2-86) 


From Equations 9.2-82 and 9.2-86, we observe that when we express the SNR 
d 2 /a 2 in terms of the average transmitter power ri lv , there is a loss incurred due to 
channel distortion. In the case of the biters given by Equation 9.2-79, the loss is 



W X rc jf) 
w \Cif)\ 2 


df 


(9.2-87) 
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and, in the case of the filters given by Equation 9.2-83, the loss is 



w X rc (f) 
w |C(/)| 


l 2 

df 


We observe that when C{f) = 1 for |/| < W . the channel is ideal and 



X rc (f)df=l 


(9.2-88) 


(9.2-89) 


so that no loss is incurred. On the other hand, when there is amplitude distortion, 
|C(/)| < 1 for some range of frequencies in the band \ f\ <W and, hence, there is a 
loss in SNR as given by Equations 9.2-87 and 9.2-88. The interested reader may show 
(see Problem 9.30) that the biters given by Equation 9.2-83 result in the smaller SNR 
loss. 


example 9.2-1. Let us determine the transmitting and receiving biters given by 
Equation 9.2-83 for a binary communication system that transmits data at a rate of 
4800 bits/s over a channel with frequency (magnitude) response 


|C(/)| = 


1 

v/l + (//W) 2 ’ 


I/I < W 


(9.2-90) 


where W = 4800 Hz. The additive noise is zero-mean white Gaussian with spectral 
density \N 0 = 10" 15 W/Hz. 

Since W = 1/ T — 4800, we use a signal pulse with a raised cosine spectrum and 
P = 1. Thus, 


X rc (f) = hill + cos(7r T\f\)] 


= T cos- 


( n\f\ 

\ 9600 


Then, 


|G r (/)| = |G S (/)| = 


1 + 


/ 

4800 


1/4 


COS 


( n\f\ 

V 9600 


(9.2-91) 


|/| < 4800 (9.2-92) 


and |Gj’(/)| = |G«(/)| = 0, otherwise. Figure 9.2-14 illustrates the biter character- 
istic G T {f). 

One can now use these biters to determine the amount of transmitted energy £ 
required to achieve a specibed error probability. This problem is left as an exercise for 
the reader. 



FIGURE 9.2-14 

Frequency response of an optimum transmitter biter. 
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■ 9.3 

OPTIMUM RECEIVER FOR CHANNELS WITH ISI AND AWGN 

In this section, we derive the structure of the optimum demodulator and detector for dig- 
ital transmission through a nonideal band-limited channel with additive Gaussian noise. 
We begin with the transmitted (equivalent lowpass) signal given by Equation 9.2-1. 
The received (equivalent lowpass) signal is expressed as 

r(t) = Y J Inh{t-nT) + z{t) (9.3-1) 

n 

where h(t) represents the response of the channel to the input signal pulse g(t) and z.(t) 
represents the additive white Gaussian noise. 

First we demonstrate that the optimum demodulator can be realized as a filter 
matched to h(t), followed by a sampler operating at the symbol rate 1 / T and a sub- 
sequent processing algorithm for estimating the information sequence {/„} from the 
sample values. Consequently, the samples at the output of the matched filter are suffi- 
cient for the estimation of the sequence {/„}. 


9.3-1 Optimum Maximum-Likelihood Receiver 

Using the Karhunen-Loeve expansion, we expand the received signal r/it) in the series 

N 

r/(f) = lim V r k (j> k (t) (9.3-2) 

N—*oo z J 
k= 1 

where {0r(f)} is a complete set of orthonormal functions and [r k ] are the observable 
random variables obtained by projecting ij(t) onto the set {(p k (t)}. It is easily shown 
that 


n = Eh kn +Zk , k= 1,2,... (9.3-3) 

n 


where h kn is the value obtained from projecting h(t — nT ) onto and Zk is the 

value obtained from projecting z(t) onto < p k (t ). The sequence UG is Gaussian with 
zero-mean and covariance 


E(z* k z,n) = 2 Nohn 


(9.3-4) 


The joint probability density function of the random variables r N = [/q ri ■ ■ ■ r N ] 
conditioned on the transmitted sequence I p = [I \ h--- Ip], where p < N, is 


p(r N \I P ) = 


1 

2tt No 


N 


exp 


1 

2iVo 


E 

k= 1 


f k ''y ^ In h kn 


(9.3-5) 
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In the limit as the number N of observable random variables approaches infinity, the 
logarithm of p(r N \I p ) is proportional to the metrics PM(I p ), defined as 


The maximum-likelihood estimates of the symbols I\, I 2 , . . . , I p are those that 
maximize this quantity. Note, however, that the integral of |r/(f)| 2 is common to all 
metrics, and, hence, it may be discarded. The other integral involving r(f) gives rise to 
the variables 


These variables can be generated by passing rit) through a filter matched to hit) and 
sampling the output at the symbol rate l/T . The samples {y,,} form a set of sufficient 
statistics for the computation of PM(I p ) or, equivalently, of the correlation metrics 


Hence, x(t ) represents the output of a filter having an impulse response h*(—t) and 
an excitation hit). In other words, xit) represents the autocorrelation function of hit ). 
Consequently, [x n } represents the samples of the autocorrelation function of hit ), taken 
periodically at 1 /T. We are not particularly concerned with the noncausal characteristic 
of the filter matched to h(t), since, in practice, we can introduce a sufficiently large 
delay to ensure causality of the matched filter. 

If we substitute for 77 (f) in Equation 9.3-7 using Equation 9.3-1, we obtain 


where denotes the additive noise sequence of the output of the matched filter, i.e., 


The output of the demodulator (matched filter) at the sampling instants is corrupted 
by ISI as indicated by Equation 9.3-10. In any practical system, it is reasonable to 
assume that the ISI affects a finite number of symbols. Hence, we may assume that 
x„ = 0 for \n\ > L. Consequently, the ISI observed at the output of the demodulator 
may be viewed as the output of a finite state machine. This implies that the channel 
output with ISI may be represented by a trellis diagram, and the maximum-likelihood 



2 


r \ (0 - hit — nT)\ dt 


n 



(9.3-6) 



(9.3-7) 



(9.3-8) 


where, by definition, x{t) is the response of the matched filter to hit) and 

OO 

h*(t)h(t + nT)dt 



(9.3-9) 


J — OO 



(9.3-10) 


n 



(9.3-11) 
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Clock 
t = kT 


FIGURE 9.3-1 

Optimum receiver for an AWGN channel with ISI. 


estimate of the information sequence {l \ , I 2 , . . . , I p ) is simply the most probable path 
through the trellis given the received demodulator output sequence {y„}. Clearly, the 
Viterbi algorithm provides an efficient means for performing the trellis search. 

The metrics that are computed for the MLSE of the sequence {4} are given by 
Equation 9.3-8. It can be seen that these metrics can be computed recursively in the 
Viterbi algorithm, according to the relation 


CM n (I n ) = CM„_ 1 (/„_ 1 ) + Re 


I n X(> ] n 2 ^ ' X m l n - 


(9.3-12) 


\ ;«=i / . 

Figure 9.3-1 illustrates the block diagram of the optimum receiver for an AWGN 
channel with ISI. 


9.3-2 A Discrete-Time Model for a Channel with ISI 


In dealing with band-limited channels that result in ISI, it is convenient to develop 
an equivalent discrete-time model for the analog (continuous-time) system. Since the 
transmitter sends discrete-time symbols at a rate of 1 /T symbols/s and the sampled 
output of the matched filter at the receiver is also a discrete-time signal with samples 
occurring at a rate of 1 /T per second, it follows that the cascade of the analog filter 
at the transmitter with impulse response g(f), the channel with impulse response <;(/), 
the matched filter at the receiver with impulse response h*(—t), and the sampler can be 
represented by an equivalent discrete-time tranversal filter having tap gain coefficients 
{xk}- Consequently, we have an equivalent discrete-time transversal filter that spans a 
time interval of 2 L T seconds. Its input is the sequence of information symbols {4} and 
its output is the discrete-time sequence ( } given by Equation 9.3-10. The equivalent 
discrete-time model is shown in Figure 9.3-2. 

The major difficulty with this discrete-time model occurs in the evaluation of 
performance of the various equalization or estimation techniques that are discussed 
in the following sections. The difficulty is caused by the correlations in the noise 
sequence {v p } at the output of the matched filter. That is, the set of noise variables { 1 ^.} 
is a Gaussian-distributed sequence with zero-mean and autocorrelation function (see 
Problem 9.36) 


E{v* k vj) 


2N 0 Xj- k ( \k-j\<L ) 

0 (otherwise) 


(9.3-13) 
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FIGURE 9.3-2 

Equivalent discrete-time model of channel with intersymbol interference. 

Hence, the noise sequence is correlated unless .q = 0, k ^ 0. Since it is more convenient 
to deal with the white noise sequence when calculating the error rate performance, it 
is desirable to whiten the noise sequence by further filtering the sequence (_q }. A 
discrete-time noise-whitening filter is determined as follows. 

Let X(z ) denote the (two-sided) z transform of the sampled autocorrelation function 
{**}, i-e., 

L 

X{Z) = Y, **z~* (9.3-14) 

k=—L 

Since Xk = x*_ k , it follows that X(z) = X*(l/z*) and the 2 L roots of X(z) have the 
symmetry that if p is a root, 1/p* is also a root. Hence, X(z.) can be factored and 
expressed as 

X(z) = F(z)F* (i) (9.3-15) 

where F(z) is a polynomial of degree L having the roots Pi , P 2 , ■ ■ - , Pl and F*(l /z*) is 
a polynomial of degree L having the roots 1 /p* , l/p| , . . . , 1 / p* L . Assuming that there 
are no roots on the unit circle, an appropriate noise -whitening filter has a z transform 
1 /F*(l/z*). Since there are 2 L possible choices for the roots of F*(l/z*), each choice 
resulting in a filter characteristic that is identical in magnitude but different in phase 
from other choices of the roots, we propose to choose the unique F*(l/z*) that results 
in an anticausal impulse response with poles corresponding to the zeros of X(z) that are 
outside the unit circle. Such an anticausal filter is stable. Selecting the noise-whitening 
filter in this manner ensures that the resulting channel response, characterized by F(z), 
is minimum phase. Consequently, passage of the sequence { va } through the digital filter 
1 /F*(\/z*) results in an output sequence { v a} that can be expressed as 

L 

Vk = Y fnh-n + r]k 
n = 0 


(9.3-16) 
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where { % } is a white Gaussian noise sequence and { //. } is a set of tap coefficients of an 
equivalent discrete-time transversal filter having a transfer function F(z). The cascade 
of the matched filter, the sampler, and the noise-whitening filter is called the whitened 
matched filter (WMF). 

It is convenient to normalize the energy of F(z) to unity, i.e., 

Ei/"i 2 = 1 

77=0 


The minimum-phase condition on F(z) implies that the energy in the first M values of 
the impulse response {/o, f \, . . . , /m 1 is a maximum for every M. 

In summary, the cascade of the transmitting filter g(t), the channel c(t ). the matched 
filter h*(—t), the sampler, and the discrete-time noise- whitening filter 1 / F*{\/z*) can be 
represented as an equivalent discrete-time transversal filter having the set [fk] as its tap 
coefficients. The additive noise sequence {rjk} corrupting the output of the discrete-time 
transversal filter is a white Gaussian noise sequence having zero-mean and variance 
Wi. Figure 9.3-3 illustrates the model of the equivalent discrete-time system with 
white noise. We refer to this model as the equivalent discrete-time white noise filter 
model. 

example 9.3-1. Suppose that the transmitter signal pulse git) has duration T and unit 
energy and the received signal pulse is hit) = g(t) + ag(t — T). Let us determine the 
equivalent discrete-time white noise filter model. The sampled autocorrelation function 
is given by 


fa* {k = - 1) 

Xk =)l + \a\ 2 (k = 0) 

[a (k= 1) 


(9.3-17) 


141 





z _1 = delay of T 






FIGURE 9.3-3 

Equivalent discrete-time model of intersymbol interference channel with AWGN. 
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The z transform of x k is 

1 

X(z) = ^2 x k z~ k 

k=- 1 (9.3-18) 

= a*z + (1 + \a\ 2 ) + az~ l 
= (flz -1 + 1 )(a*z + 1) 

Under the assumption that \a \ < I , one chooses F(z) = az~ l + 1, so that the equivalent 
transversal filter consists of two taps having tap gain coefficients fo = 1 , /i = a. Note 
that the correlation sequence {x k } may be expressed in terms of the {/„} as 

L-k 

x k =Y J fnfn+k , k = 0, 1,2, . . . , L (9.3-19) 

n=0 

When the channel impulse response is changing slowly with time, the matched 
filter at the receiver becomes a time-variable filter. In this case, the time variations 
of the channel/matched-filter pair result in a discrete-time filter with time-variable 
coefficients. As a consequence, we have time- variable intersymbol interference effects, 
which can be modeled by the filter illustrated in Figure 9.3-3, where the tap coefficients 
are slowly varying with time. 

The discrete-time white noise linear filter model for the intersymbol interference 
effects that arise in high-speed digital transmission over nonideal band-limited channels 
will be used throughout the remainder of this chapter in our discussion of compensa- 
tion techniques for the interference. In general, the compensation methods are called 
equalization techniques or equalization algorithms. 

9.3-3 Maximum-Likelihood Sequence Estimation (MLSE) 
for the Discrete-Time White Noise Filter Model 

In the presence of intersymbol interference that spans L + 1 symbols ( L interfering 
components), the MLSE criterion is equivalent to the problem of estimating the state of a 
discrete-time finite-state machine. The finite-state machine in this case is the equivalent 
discrete-time channel with coefficients {f k }, and its state at any instant in time is given 
by the L most recent inputs, i.e., the state at time k is 

S k = (4-i, 4-2, • • ■ , 4 -l) (9.3-20) 

where I k = 0 for k < 0. Hence, if the information symbols are M- ary, the channel filter 
has M l states. Consequently, the channel is described by an M L -state trellis and the 
Viterbi algorithm may be used to determine the most probable path through the trellis. 

The metrics used in the trellis search are akin to the metrics used in soft-decision 
decoding of convolutional codes. In brief, we begin with the samples iq, u 2 , . . . , Vl+ i, 
from which we compute the M L+l metrics 

L+ 1 

In P(v k \h, Ik- 1 , • • ■ , h-L) (9.3-21) 

k= 1 

The M L+l possible sequences of Il+ i, h, ■ ■ ■ , I 2 , l\ are subdivided into M L groups 
corresponding to the M L states (I L+ 1 , //,..., / 2 ). Note that the M sequences in each 
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group (state) differ in 4 and correspond to the paths through the trellis that merge at a 
single node. From the M sequences in each of the M L states, we select the sequence 
with the largest probability (with respect to 4) and assign to the surviving sequence 
the metric 

PM\(I l+ {) = PM\(I l+ i, I L , . . . , I 2 ) 

(9.3-22) 

= max In P(v k \I k , 4- 1, ■ • • . h-i.) 

u k=\ 

The M — 1 remaining sequences from each of the M L groups are discarded. Thus, we 
are left with M L surviving sequences and their metrics. 

Upon reception of vl+ 2, the M L surviving sequences are extended by one stage, and 
the corresponding M L+l probabilities for the extended sequences are computed using 
the previous metrics and the new increment, which is In p(vl+ 2 \Il+ 2 , h.\ 1 , . . . , 4). 
Again, the M i+I sequences are subdivided into M L groups corresponding to the M L 
possible states (I L+ 2 , . . . , 4) and the most probable sequence from each group is se- 
lected, while the other M — 1 sequences are discarded. 

The procedure described continues with the reception of subsequent signal samples. 
In general, upon reception of Vi+k , the metrics^ 

PM k (l L +k) = max [In p(v L+k \I L+k , . . . , 4) + PM k -i(I L + k -i))] (9.3-23) 

h 

that are computed give the probabilities of the M L surviving sequences. Thus, as each 
signal sample is received, the Viterbi algorithm involves first the computation of the 
M L+l probabilities 

In P (v L+k \I L+k , . . . , 4) + FM w (/ i+ n) (9.3-24) 

corresponding to the M L+l sequences that form the continuations of the M L surviving 
sequences from the previous stage of the process. Then the M L+1 sequences are subdi- 
vided into M l groups, with each group containing M sequences that terminate in the 
same set of symbols 4,+*> ■ ■ ■ ■ 4+i and differ in the symbol 4- From each group of 
M sequences, we select the one having the largest probability as indicated by Equa- 
tion 9.3-23, while the remaining M — 1 sequences are discarded. Thus, we are left 
again with M L sequences having the metrics PM k {I L+k)- 

As indicated previously, the delay in detecting each information symbol is variable. 
In practice, the variable delay is avoided by truncating the surviving sequences to the 
q most recent symbols, where q L, thus achieving a fixed delay. In the case that 
the M l surviving sequences at time k disagree on the symbol I k - q , the symbol in the 
most probable sequence may be chosen. The loss of performance resulting from this 
suboptimum decision procedure is negligible if <7 > 5 L. 

example 9.3-2. For illustrative purposes, suppose that a duobinary signal pulse is 
employed to transmit four-level (M = 4) PAM. Thus, each symbol is a number selected 
from the set {—3, —1, 1, 3}. The controlled intersymbol interference in this partial- 
response signal is represented by the equivalent discrete-time channel model shown in 


tWe observe that the metrics PM k (T) are simply related to the Euclidean distance metrics DM k ( I) when the 
additive noise is Gaussian. 
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(a) 



FIGURE 9.3-4 

Equivalent discrete-time model for intersymbol interference resulting from a duobinary pulse. 


Figure 9.3 — 4. Suppose we have received tq and iq, where 

vi= h + rj i 
V2 = II + h + h2 


(9.3-25) 


and { r]j } is a sequence of statistically independent zero-mean Gaussian noise. We may 
now compute the 16 metrics 


PMdh, h) = Vk ~ E Ik - 


k = 1 


j = 0 


/i,/ 2 = ±l,±3 


(9.3-26) 


where p = 0 for k < 0. 

Note that any subsequently received signals {iq} do not involve 7). Hence, at this 
stage, we may discard 1 2 of the 1 6 possible pairs { 7i , 12 } . This step is illustrated by the 
tree diagram shown in Figure 9.3-5. In other words, after computing the 16 metrics 
corresponding to the 16 paths in the tree diagram, we discard three out of the four paths 
that terminate with I 2 = 3 and save the most probable of these four. Thus, the metric 
for the surviving path is 


PM\{I 2 = 3, I\) = max 
h 


2 


E 


Vk 


■ 5 > 

j = 0 


2' 


The process is repeated for each set of four paths terminating with 7 2 = 1, 7 2 = — 1, 
and I 2 = —3. Thus four paths and their corresponding metrics survive after V\ and iq 
are received. 

When iq is received, the four paths are extended as shown in Figure 9.3-5 to yield 
16 paths and 16 corresponding metrics given by 


PM 2 (h, I 2 , h) = PMi(I 2 ,h) - 


v 3 


-E'3- 


i = 0 


(9.3-27) 


Of the four paths terminating with the 1^, = 3 , we save the most probable. This procedure 
is again repeated for 73 = 1, 73 = — 1, and 7 3 = —3. Consequently, only four paths 
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PM x (I 2 ,h) PM 2 (/ 3 ,/ 2 ,/,) pm 3 (/ 4 ,/ 3 ,/ 2 ,/,) 


FIGURE 9.3-5 

Tree diagram for Viterbi decoding of the duobinary pulse. 

survive at this stage. The procedure is then repeated for each subsequently received 
signal Vk for k > 3. 


9.3 — 4 Performance of MLSE for Channels with ISI 

We shall now determine the probability of error for the MLSE of the received informa- 
tion sequence when the information is transmitted via PAM and the additive noise is 
Gaussian. The similarity between a convolutional code and a finite-duration intersym- 
bol interference channel implies that the method for computing the error probability 
for the latter carries over from the former. In particular, the method for computing the 
performance of soft-decision decoding of a convolutional code by means of the Viterbi 
algorithm, described in Section 8.3, applies with some modification. 
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In PAM signaling with the additive Gaussian noise and intersymbol interference, 
the metrics used in the Viterbi algorithm may be expressed as in Equation 9.3-23, or, 
equivalently, as 


PM k - L (I k ) = PM k - L -\(I k-\) - 




(9.3-28) 


where the symbols {/„} may take the values +d, ±3 d , . . . , +(M — I )d, and 2d is the 
distance between successive levels. The trellis has M L states, defined at time k as 


S k = (h-x, I k - 2 , ■ ■ ■ , h-L) (9.3-29) 

Let the estimated symbols from the Viterbi algorithm be denoted by { /„ } and the 
corresponding estimated state at time k by 


Sk = (dk-\, h-2, ■ ■ ■ , h-L ) 


(9.3-30) 


Now suppose that the estimated path through the trellis diverges from the correct path at 
time k and remerges with the correct path at time k + l. Thus, S k = S k and S k + i = S k+ 1 , 
but S m i=- S m for k < m < k + /. As in a convolutional code, we call this an error 
event. Since the channel spans L + 1 symbols, it follows that l > L + 1. 

For such an error event, we have / I k and I k +i-L-\ ¥= h+t-L-i, but I m = I m 
for k — L < m < k — 1 and k + l — L<m<k + l — 1 . It is convenient to define an 
error vector e corresponding to this error event as 

e = [s k £k + 1 ••• £k+i-L-\] (9.3-31) 

where the components of e are defined as 

Sj = -^(/y — I j), j =k,k+ 1, ...,k + l - L - 1 (9.3-32) 

The normalization factor of 2d in Equation 9.3-32 results in elements e ; that take on 
the values 0, ±1, ±2, ±3, . . . , +{M — 1). Moreover, the error vector is characterized 
by the properties that e k i=- 0 , Sk+i-i .- 1 i=- 0 , and there is no sequence of L consecutive 
elements that are zero. Associated with the error vector in Equation 9.3-31 is the 
polynomial of degree l — L — 1 , 

s(z) = 8 k + S k+ iz~ l + S k+2 z~ 2 H h e k+l -L-iz~ {l ~ L ~ l) (9.3-33) 

We wish to determine the probability of occurrence of the error event that begins 
at time k and is characterized by the error vector e given in Equation 9.3-31 or, equiv- 
alently, by the polymonial given in Equation 9.3-33. To accomplish this, we follow the 
procedure developed by Forney (1972). Specifically, for the error event e to occur, the 
following three subevents E\, E 2 , and £3 must occur: 

Ei : At time k, S k = S k . 

E 2 : The information symbols I k , I k +\, ■ ■ . , I k +i-L - 1 when added to the scaled 

error sequence 2 d{s k , e k +i, ■ - ■ , Ek+i-i.- 1 ) must result in an allowable se- 
quence, i.e., the sequence Ik, I k +\, ■ ■ ■ , h+i-L - 1 must have values selected 
from ±d, ±3 d, ± ■ ■ ■ ± (M - \)d. 

£ 3 : For k < m < k+l , the sum of the branch metrics of the estimated path 

exceeds the sum of the branch metrics of the correct path. 
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The probability of occurrence of E \ is 


P(E 3 ) = P 


k+l-l 


k+l - 1 


55 K - 55 tfi-} < 55 U- - 55 fi J ‘- 


i=k 


2= 0 


i=k 


2 = 0 


(9.3-34) 


But 


v i = Y,fi I i-J + T U (9.3-35) 

j = o 

where {??,} is a real-valued white Gaussian noise sequence. Substitution of Equ- 
ation 9.3-35 into Equation 9.3-34 yields 

2 


P(E 3 ) = P 


= p 


k+l-l 


k+l-l 


55 I Vi + 2 d Y. f j £ i-j ] < 55 Vi 


i—k 


2= 0 


k+l-l 


i=k 


k+l-l / L 


4d 55 vt 55 fj s t-j < ~ 4d2 55 55 M-; 


i=k 


,2=0 


i=k \j = 0 


(9.3-36) 


where e , = 0 for j < A: and j > k + l — L — 1 . If we define 


«< = 55 M- 

2=0 


then Equation 9.3-36 may be expressed as 


n+i-i 


k+l-l 


p(Ei ) = p 55 < ~ d 55 


(9.3-37) 


(9.3-38) 


i—k 


i=k 


where the factor of 4 d common to both terms has been dropped. Now Equation 9.3-38 
is just the probability that a linear combination of statistically independent Gaussian 
random variables is less than some negative number. Thus 


P(E 3 ) = Q 


Id 2 2 


(9.3-39) 


For convenience, we define 


k+l-l 


k+l-l / L 


8\e) = 55 af = 55 fj £ i~J 


i=k 


i=k \ 7=0 


(9.3-40) 


where Sj = 0 for j < k and j > k + l — L — 1. Note that the {a, } resulting from the 
convolution of {/)■} with {S j } are the coefficients of the polynomial 


a(z) = F(z)s{z) 

= a k + a k +iz~ l H h a k+ i-iz 


-(/- 1 ) 


(9.3^11) 
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Furthermore, 8 2 (e) is simply equal to the coefficient of z° in the polynomial 

a(z)a(z~ l ) = F(z)F(z _1 )e(z)e(z‘ 1 ) 

= X(z)e(z)e(z _1 ) 


(9.3-42) 


We call 8 2 (e) the Euclidean weight of the error event e. 

An alternative method for representing the result of convolving {/,■} with { e , } is 
the matrix form 

a = ef 

where a is an /-dimensional vector, / is an (L + 1) -dimensional vector, and e is an 
/ x (L + 1) matrix defined as 



Olk 



■/o' 

a = 

Oir+l 

- 

/ = 

h 


.«*+/- 1 . 



Jl. 


Sk 

0 

0 



Sk+l 

Sk 

0 


e = 

Sk+2 

£k+ 1 

Sk 



0 

0 

0 


Ek+l-l 


0 

0 

0 


Sk+l-L - 1 


Then 


'Aj 

Pi 

Pi ■■■ 

Pl ' 


Pi 

Pa 

Pi ••• 

Pl- l 


Pi 

Pi 

Pa Pi 

Pl—2 

(9.3-45) 

Jl 

k+l-l- 

m 

Pa _ 


Pm = 

E 

i=k 

&i &i -\-m 


(9.3-46) 


(9.3^13) 


(9.3-44) 


8 2 (e ) = a'ct 
= f^ef 
= f‘Af 

where A is an (L + 1) x (L + 1) matrix of the form 


A = e‘e = 


and 


We may use either Equations 9.3-40 and 9.3-41 or Equations 9.3 — 45 and 9.3-46 in 
evaluating the error rate performance. We consider these computations later. For now 
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we conclude that the probability of the subevent £ 3 , given by Equations 9.3-39, may 
be expressed as 


P(Ei) = Q 



= Q 


M 2 - 1 


Kav<5 2 0) 


where we have used the relation 


d 2 


3 

M 2 - 1 


TP 

1 r av 


(9.3-47) 


(9.3-48) 


to eliminate d 2 and y av = TP av /No- Note that, in the absence of intersymbol interfer- 
ence, 8 2 (e) = 1 and £(£ 3 ) is proportional to the symbol error probability of Af-ary 
PAM. 

The probability of the subevent E 2 depends only on the statistical properties of 
the input sequence. We assume that the information symbols are equally probable and 
that the symbols in the transmitted sequence are statistically independent. Then, for an 
error of the form |e, | = j, j = 1, 2, . . . , M — 1, there are M — j possible values of /, 
such that 


— /;■ + 2d Si 


Hence 


W M - [e, | 

P(E 2 )= I] M (9.3-49) 

r=0 M 

The probability of the subevent E\ is much more difficult to compute exactly be- 
cause of its dependence on the subevent £ 3 . That is, we must compute P{E\ \ £ 3 ). How- 
ever, £(£]|£ 3 ) = 1 — £,,, where P e is the symbol error probability. Hence P( E t \ £ 3 ) 
is well approximated (and upper-bounded) by unity for reasonably low symbol error 
probabilities. Therefore, the probability of the error event e is well approximated and 
upper-bounded as 


P(e) < Q 



y av 8 2 (e) 


l—L—l 


n 


M-\Sj\ 

M 


(9.3-50) 


Let £ be the set of all error events e starting at time k and let w(s) be the cor- 
responding number of nonzero components (Hamming weight or number of symbol 
errors) in each error event e. Then the probability of a symbol error is upper-bounded 
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(union bound) as 


Pe<J2 W ^ P ^ 


eeE 


l—L—l 


< E w ^Q v M i _ , y-^ 2 (e) n 

eeE \ » / i=0 


M-\Sj\ 

M 


(9.3-51) 


Now let D be the set of all 5(e). For each 5 e /?, let E s be the subset of error events 
for which 5(e) = 5. Then Equation 9.3-51 may be expressed as 


P e<Y,Q 

8eD 



E w{ - e) 

ee£ s 


l-L - 1 

n 

/= 0 


M ~ |g,- 
M 


<E^e 

&€D 



(9.3-52) 


where 

l ~lz l M- |e, | 

k * = e n (9 - 3 ~ 53) 

eefia r=0 


The expression for the error probability in Equation 9.3-52 is similar to the form 
of the error probability for a convolutional code with soft-decision decoding given 
by Equation 8.2-19. The weighting factors {K s } may be determined by means of the 
error state diagram, which is akin to the state diagram of a convolutional encoder. This 
approach has been illustrated by Forney (1972) and Viterbi and Omura (1979). 

In general, however, the use of the error state diagram for computing P e is tedious. 
Instead, we may simplify the computation of P e by focusing on the dominant term in the 
summation of Equation 9.3-52. Because of the exponential dependence of each term 
in the sum, the expression P e is dominated by the term corresponding to the minimum 
value of 5, denoted as 5 mm . Hence the symbol error probability may be approximated 
as 

Pe ~ K ;) e (J , y.v^.nj (9.3-54) 

where 

e -wnE <«-55> 

eeE s . /= 0 


In general, 5“ lin < 1. Hence, 10 log 5“ lin represents the loss in SNR due to intersymbol 
interference. 

The minimum value of 5 may be determined either from Equation 9.3-40 or from 
evaluation of the quadratic form in Equation 9.3-44 for different error sequences. In 
the following two examples we use Equation 9.3-40. 
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example 9.3-3. Consider a two path channel (L = 1) with arbitrary coefficients /o 
and f\ satisfying the constraint / 0 2 + /f = 1 . The channel characteristic is 

F(z) = f 0 + fiz~ l (9.3-56) 

For an error event of length n , 

e(z) = so + £iz~‘ H F £„-iZ _< " _1) , n > 1 (9.3-57) 

The product a(z) = F(z)e(z) may be expressed as 

a(z) = »o + Q!iz _1 H F oi n z~ n (9.3-58) 

where ao = £o/o and a„ = f\ £„_ i . Since so ^ 0, e„_i ^ 0, and 

n 

8 2 (e) = ^a 2 k (9.3-59) 

k= 0 

it follows that 

sL > fo + fi = i 

Indeed, = 1 when a single error occurs, i.e., e(z) = Sq. Thus, we conclude that 
there is no loss in SNR in maximum-likelihood sequence estimation of the information 
symbols when the channel dispersion has length 2. 

example 9.3-4. The controlled intersymbol interference in a partial-response signal 
may be viewed as having been generated by a time-dispersive channel. Thus, the inter- 
symbol interference from a duobinary pulse may be represented by the (normalized) 
channel characteristic 

F(z) = + \J\z~ l (9.3-60) 

Similarly, the representation for a modified duobinary pulse is 

F (z) =\f\~ sj\zT 2 (9.3-61) 

The minimum distance 8^ — 1 for any error event of the form 

e(z) = ±(1 -z~' -z~ 2 z“ (n_1) ), n > 1 (9.3-62) 

for the channel given by Equation 9.3-60, since 

a(z) = ±\J \- F \J\z~ n 

Similarly, when 

e(z) = ±(1 + z” 2 + z~ 4 H F z -2 *" -11 ), n > 1 

ri 2 ln = 1 for the channel given by Equation 9.3-61 since 


(9.3-63) 
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Hence the MLSE of these two partial-response signals result in no loss in SNR. In 
contrast, the suboptimum symbol-by-symbol detection described previously resulted 
in a 2.1-dB loss. 

The constant Kg^ is easily evaluated for these two signals. With precoding, the 
number of output symbol errors (Hamming weight) associated with the error events in 
Equations 9.3-62 and 9.3-63 is two. Hence, 

Y* f M - 1 \" 

= 2 E ( -JT ) = 2(M ~ 1 } (9-3-64) 

n=\ ' ' 

On the other hand, without precoding, these error events result in n symbol errors, and, 
hence, 

Y ( M — \\ n 

^ = 2 E n K- = 2M(M — 1) (9.3-65) 

n= 1 ' ' 

As a final exercise, we consider the evaluation of <5^ in from the quadratic form in 
Equation 9.3 — 44. The matrix A of the quadratic form is positive-definite; hence, all 
its eigenvalues are positive. If {//^(e)} are the eigenvalues and j u/.(e)} are the corre- 
sponding orthonormal eigenvectors of A for an error event e, then the quadratic form 
in Equation 9.3-44 can be expressed as 

L + 1 

8 2 (e) = Hk{e)[fv k (e)f (9.3-66) 

k=\ 

In other words, 8 2 (e) is expressed as a linear combination of the squared projections 
of the channel vector / onto the eigenvectors of A. Each squared projection of the sum 
is weighted by the corresponding eigenvalue nAe), k = 1, 2, . . . , L + 1. Then 

8 2 min = min8 2 (e) (9.3-67) 

e 

It is interesting to note that the worst channel characteristic of a given length L + 1 
can be obtained by finding the eigenvector corresponding to the minimum eigenvalue. 
Thus, if // m in(£) is the minimum eigenvalue for a given error event e and u mm (£) is the 
corresponding eigenvector, then 

//min = min // min(s) 
e 

f = min v m i n (e) 

e 

and 


°min — fbnin 

example 9.3-5. Let us determine the worst time-dispersive channel of length 
3 (L = 2) by finding the minimum eigenvalue of A for different error events. Thus, 

F(z) = f 0 + /iz" 1 + fiz~ 2 

where /o, f\, and /j are the components of the eigenvector of A corresponding to the 
minimum eigenvalue. An error event of the form 

e(z) = 1 - z _1 
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results in a matrix 


A = 



-1 O' 
2 -1 
-1 2 


which has the eigenvalues ji\ — 2, /xi = 2 + ~J2, /X3 = 2 — y/2. The eigenvector 
corresponding to /x 3 is 


p 

2 



(9.3-68) 


We may also consider the dual error event 


e(z ) = 1 + z 1 


which results in the matrix 


A = 


'2 1 
1 2 
0 1 


O' 

1 

2 


This matrix has eigenvalues identical to those of the one for s(z) = 1 — z ■ The 
corresponding eigenvector for /x.3 = 2 — ~Jl is 


v, = 


1 

2 



(9.3-69) 


Any other error events lead to larger values for /x m i n . Hence, /x lmn = 2 — *J2 and 
the worst-case channel is either 



The loss in SNR from the channel is 


- 10 log <5^ in = - 10 log /x min = 2.3 dB 


Repetitions of the above computation for channels with L = 3,4, and 5 yield the 
results given in Table 9.3-1. 


TABLE 9.3-1 

Maximum Performance Loss and Corresponding 
Channel Characteristics 


Channel length 
L + l 

Performance loss 
— 101og^ in dB 

Minimum-distance channel 

3 

2.3 

0.50, 0.71,0.50 

4 

4.2 

0.38, 0.60, 0.60, 0.38 

5 

5.7 

0.29, 0.50, 0.58, 0.50, 0.29 

6 

7.0 

0.23, 0.42, 0.52, 0.52, 0.42, 0.23 
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The MLSE for a channel with ISI has a computational complexity that grows exponen- 
tially with the length of the channel time dispersion. If the size of the symbol alphabet 
is M and the number of interfering symbols contributing to ISI is L, the Viterbi al- 
gorithm computes M L+X metrics for each new received symbol. In most channels of 
practical interest, such a large computational complexity is prohibitively expensive to 
implement. 

In this and the following sections, we describe suboptimum channel equalization 
approaches to compensate for the ISI. One approach employs a linear transversal filter, 
which is described in this section. This filter structure has a computational complexity 
that is a linear function of the channel dispersion length L. 

The linear filter most often used for equalization is the transversal filter shown in 
Figure 9.4-1. Its input is the sequence {i;*} given in Equation 9.3-16 and its output in 
the estimate of the information sequence {4}. The estimate of the &th symbol may be 
expressed as 

K 

h = E W-J (9.4- 1 ) 

j=-K 

where { cj } are the 2 K + 1 complex-valued tap weight coefficients of the filter. The 
estimate 1 1 is quantized to the nearest (in distance) information symbol to form the 
decision 7/.. If /jt is not identical to the transmitted information symbol U , an error has 
been made. 

Considerable research has been performed on the criterion for optimizing the filter 
coefficients {q}. Since the most meaningful measure of performance for a digital com- 
munication system is the average probability of error, it is desirable to choose the coeffi- 
cients to minimize this performance index. However, the probability of error is a highly 
non-linear function of { cj | . Consequently, the probability of error as a performance 


Unequalized 



FIGURE 9.4-1 

Linear transversal filter. 
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index for optimizing the tap weight coefficients of the equalizer is computationally 
complex. 

Two criteria have found widespread use in optimizing the equalizer coefficients 
{cj }. One is the peak distortion criterion and the other is the mean-square-error criterion. 


9.4-1 Peak Distortion Criterion 


The peak distortion is simply defined as the worst-case intersymbol interference at the 
output of the equalizer. The minimization of this performance index is called the peak 
distortion criterion. First we consider the minimization of the peak distortion assuming 
that the equalizer has an infinite number of taps. Then we shall discuss the case in which 
the transversal equalizer spans a finite time duration. 

We observe that the cascade of the discrete-time linear filter model having an im- 
pulse response { /„ } and an equalizer having an impulse response {c„ } can be represented 
by a single equivalent filter having the impulse response 

OO 

q n = (9.4-2) 

j=-oo 

That is, {q n } is simply the convolution of { c „ } and {/„}. The equalizer is assumed to 
have an infinite number of taps. Its output at the kth sampling instant can be expressed 
in the form 


OO 

i k q o 1 1 ; T ^ ^ I n qk- n d - ^ ^ CjVk—j (9.4 3) 

n=£k j = — oo 

The first term in Equation 9.4-3 represents a scaled version of the desired sym- 
bol. For convenience, we normalize qo to unity. The second term is the intersymbol 
interference. The peak value of this interference, which is called the peak distortion, is 


£>(c) = \q„\ 


n=—o o 
n* 0 


= £ 


n =— oo 
0 


£ C jfn- 


J=~ oo 


(9.4-4) 


Thus, T>(c) is a function of the equalizer tap weights. 

With an equalizer having an infinite number of taps, it is possible to select the 
tap weights so that T>{c ) = 0, i.e., q n = 0 for all n except n = 0. That is, the 
intersymbol interference can be completely eliminated. The values of the tap weights 
for accomplishing this goal are determined from the condition 


OO 

£ 

j=-o o 


c ifn-j — 


q, 


1 in = 0) 

0 in + 0) 


(9-4-5) 
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FIGURE 9.4-2 

Block diagram of channel with 
zero-forcing equalizer. 


By taking the z transform of Equation 9.4-5, we obtain 

Q(z ) = C(z)F(z ) = 1 (9.4-6) 


or, simply. 


C(z) = 


1 

Hz) 


(9.4-7) 


where C(z) denotes the z transform of the { c ; } . Note that the equalizer, with transfer 
function C(z), is simply the inverse filter to the linear filter model F(z). In other words, 
complete elimination of the intersymbol interference requires the use of an inverse 
filter to F(z). We call such a filter a zero-forcing filter. Figure 9.4-2 illustrates in block 
diagram the equivalent discrete-time channel and equalizer. 

The cascade of the noise-whitening filter having the transfer function I / F*( I /z*) 
and the zero-forcing equalizer having the transfer function 1 / F{z) results in an equiv- 
alent zero-forcing equalizer having the transfer function 


C'{z) 


1 

F(z)F*( l/z*) 


1 

Hz) 


(9.4-8) 


as shown in Figure 9.4-3. This combined filter has as its input the sequence { v*} of 
samples from the matched filter, given by Equation 9.3-10. Its output consists of the 
desired symbols corrupted only by additive zero-mean Gaussian noise. The impulse 
response of the combined filter is 


4=2 

= — <f dz 

2nj J X(z) 


(9.4-9) 


where the integration is performed on a closed contour that lies within the region of 
convergence of Cfz). Since X(z) is a polynomial with 2 L roots (pi, pi, • ■ • , Pl, 1/p*, 
I/P 2 , . . . , 1 / Pl), it follows that C'(z) must converge in an annular region in the z plane 
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Channel 
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Equalizer 
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X(z) = F(z)F*( l/z’) 
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C'(-) = ! = — l — 

F(z)F*(Uz) X(z) 


FIGURE 9.4-3 

Block diagram of channel with equivalent zero-forcing equalizer. 
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that includes the unit circle (z = e jfl ). Consequently, the closed contour in the integral 
can be the unit circle. 

The performance of the infinite-tap equalizer that completely eliminates the inter- 
symbol interference can be expressed in terms of the SNR at its output. For mathematical 
convenience, we normalize the received signal energy to unity. This implies that qo = 1 
and that the expected value of | /^ | 2 is also unity. Then the SNR is simply the reciprocal 
of the noise variance <r 2 at the output of the equalizer. 2 

The value of a 2 can be simply determined by observing that the noise sequence 
{ at the input to the equivalent zero-forcing equalizer C'(z) has zero-mean and a 
power spectral density 

S vv (co) = N 0 X(e iwT ), M < j (9.4-10) 

where X(e J " >r ) is obtained from X(z) by the substitution z = e J,:,T . Since C'(z) = 
l/X(z), it follows that the noise sequence at the output of the equalizer has a power 
spectral density 


Consequently, the variance of the noise variable at the output of the equalizer is 


r n/T 


o- = — / S nn (w)dco 
2.JZ J- 


n/T 

r' T dco 
2 n J —n/T X{e^ r ) 


T No 


(9.4-12) 


and the SNR for the zero-forcing equalizer is 


1 

Yoo — 2 

CT.f 


T No 


rn/T 


dco 


[ 2 Tt J —jt /t X(e^ T )\ 


(9.4-13) 


where the subscript on y indicates that the equalizer has an infinite number of taps. 

The spectral characteristics X(e jcoT ) corresponding to the Fourier transform of the 
sampled sequence {x„} has an interesting relationship to the analog filter H ((») used at 
the receiver. Since 


x k 



h*(t)h(t + kT)dt 


use of Parseval’s theorem yields 


x k 


- 1 - f" \H(co)\ 2 e ja>kT dco 

J — oo 


(9.4-14) 


tThis normalization is used throughout this chapter for mathematical convenience. 

tlf desired, one can multiply this normalized SNR at the output of the equalizer by the signal energy. 
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where H(a > ) is the Fourier transform of hit). But the integral in Equation 9.4-14 can 
be expressed in the form 


From a comparison of Equations 9.4-15 and 9.4-17, we obtain the desired relationship 
beween X(e ja>T ) and H(co). That is. 


We also observe that H(a>)\ 2 = X(a>), where X(oj) is the Fourier transform of the 
waveform x{t) and x{t) is the response of the matched filter to the input pulse h(t). 
Therefore the right-hand side of Equation 9.4-18 can also be expressed in terms of 
X(co). 

Substitution for X(e ja>T ) in Equation 9.4-13 using the result in Equation 9.4-18 
yields the desired expression for the SNR in the form 


We observe that if the folded spectral characteristic of H(o>) possesses any zeros, the 
integrand becomes infinite and the SNR goes to zero. In other words, the performance of 
the equalizer is poor whenever the folded spectral characteristic possesses nulls or takes 
on small values. This behavior occurs primarily because the equalizer, in eliminating 
the intersymbol interference, enhances the additive noise. For example, if the channel 
contains a spectral null in its frequency response, the linear zero-forcing equalizer 
attempts to compensate for this by introducing an infinite gain at that frequency. But 
this compensates for the channel distortion at the expense of enhancing the additive 
noise. On the other hand, an ideal channel coupled with an appropriate signal design 
that results in no intersymbol interference will have a folded spectrum that satisfies the 
condition 



(9.4-15) 


Now, the Fourier transform of [x ^ } is 


OO 



(9.4-16) 


k=— oo 


and the inverse transform yields 



(9.4-17) 



T 


(9.4-18) 


where the right-hand side of Equation 9.4-18 is called the folded spectrum of | H(co)\ 2 . 



(9.4-19) 



7T 


T 


n = — oo 


(9.4-20) 
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In this case, the SNR achieves its maximum value, namely, 



(9.4-21) 


Finite-length equalizer Let us now turn our attention to an equalizer having 2K +\ 
taps. Since cj = 0 for |j| > K, the convolution of {/„} with {c„} is zero outside the 
range —K<n<K+L— 1. That is, q n = 0 for n < —K and n > K + L — 1. With 
£/o normalized to unity, the peak distortion is 


K+L - 1 

V(c)= E M 


n——K 

n^O 


K+L - 1 

E 

n——K 

n* 0 


E c jfn-j 


(9.4-22) 


Although the equalizer has 2 K + 1 adjustable parameters, there are 2 K + L nonzero 
values in the response {q„}. Therefore, it is generally impossible to completely eliminate 
the intersymbol interference at the output of the equalizer. There is always some residual 
interference when the optimum coefficients are used. The problem is to minimize Pic) 
with respect to the coefficients {c, }. 

The peak distortion given by Equation 9.4-22 has been shown by Lucky ( 1965) to 
be a convex function of the coefficients {c ; }. That is, it possesses a global minimum and 
no local minima. Its minimization can be carried out numerically using, for example, 
the method of steepest descent. Little more can be said for the general solution to this 
minimization problem. However, for one special but important case, the solution for 
the minimization of P(c) is known. This is the case in which the distortion at the input 
to the equalizer, defined as 


D 



Ei /"i 


(9.4-23) 


is less than unity. This condition is equivalent to having the eye open prior to equaliza- 
tion. That is, the intersymbol interference is not severe enough to close the eye. Under 
this condition, the peak distortion P(c) is minimized by selecting the equalizer coeffi- 
cients to force q„ = 0 for I < \n\ < K and qo = 1. In other words, the general solution 
to the minimization of P(c), when D 0 < 1 , is the zero-forcing solution for [q n } in the 
range 1 < W < K . However, the values of {q „ } for K + 1 < n < K + L— l are nonzero, 
in general. These nonzero values constitute the residual intersymbol interference at the 
output of the equalizer. 


9.4-2 Mean-Square-Error (MSE) Criterion 

In the MSE criterion, the tap weight coefficients {cj} of the equalizer are adjusted to 
minimize the mean square value of the error 

= L Ik 


(9.4-24) 
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where 4 is the information symbol transmitted in the Ath signaling interval and 4 is 
the estimate of that symbol at the output of the equalizer, defined in Equation 9.4-1. 
When the information symbols {4} are complex-valued, the performance index for the 
MSE criterion, denoted by J, is defined as 

J = E\s k \ 2 = E\I k - I k | 2 (9.4-25) 

On the other hand, when the information symbols are real- valued, the performance index 
is simply the square of the real part of e k . In either case, J is a quadratic function of the 
equalizer coefficients { Cj } . In the following discussion, we consider the minimization 
of the complex-valued form given in Equation 9.4-25. 


Infinite-length equalizer First, we shall derive the tap weight coefficients that 
minimize J when the equalizer has an infinite number of taps. In this case, the estimate 
4 is expressed as 

OO 

4 = 51 c J v k-J (9.4-26) 

j=-oo 

Substitution of Equation 9.4-26 into the expression for J given in Equation 9.4-25 and 
expansion of the result yields a quadratic function of the coefficients {c y }. This function 
can be easily minimized with respect to the {c 7 } to yield a set (infinite in number) of 
linear equations for the { c 7 } . Alternatively, the set of linear equations can be obtained 
by invoking the orthogonality principle in mean square estimation. That is, we select 
the coefficients {cj} to render the error e k orthogonal to the signal sequence {v^L/l f° r 
— oo < / < oo. Thus, 

E (£a^_/) = 0, — oo < / < oo (9.4-27) 

Substitution for e k in Equation 9.4-27 yields 


h~ 55 c J Vk ~ 


J v k—l 


= 0 


;=— o o 


or, equivalently, 

OO 

55 cjE^jvU) = E(l k v* k _,), 


— oo < l < oo 


(9.4-28) 


] = - oo 


To evaluate the moments in Equation 9.4-28, we use the expression for v k given 
in Equation 9.3-16. Thus, we obtain 

L 

n = 0 

j x l-j + MA; 

“ 1 0 


(\l-j\ < L) 

(otherwise) 


(9.4-29) 
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and 

E(hvU) = ^ f () l 

Now, if we substitute Equations 9.4—29 and 9.4-30 into Equation 9.4-28 and take the 
z transform of both sides of the resulting equation, we obtain 

C(z)[F(z)F*(l/z*) + N 0 ] = F*(l/z*) (9.4-31) 


(-£</< 0 ) 

(otherwise) 


(9.4-30) 


Therefore, the transfer function of the equalizer based on the MSE criterion is 


C(z) = 


F*(l/z*) 

F(z)F*(l/z*) + N 0 


(9.4-32) 


When the noise- whitening filter is incorporated into C(z), we obtain an equivalent 
equalizer having the transfer function 


C'(z) 


1 

F(z)F*(l/z*) + N 0 

1 

X(z) + No 


(9.4-33) 


We observe that the only difference between this expression for C\z) and the 
one based on the peak distortion criterion is the noise spectral density factor No that 
appears in Equation 9.4-33. When No is very small in comparison with the signal, 
the coefficients that minimize the peak distortion V(c) are approximately equal to 
the coefficients that minimize the MSE performance index J . That is, in the limit as 
No — > 0, the two criteria yield the same solution for the tap weights. Consequently, 
when No = 0, the minimization of the MSE results in complete elimination of the 
intersymbol interference. On the other hand, that is not the case when No i=- 0. In 
general, when No 0, there is both residual intersymbol interference and additive 
noise at the output of the equalizer. 

A measure of the residual intersymbol interference and additive noise is obtained 
by evaluating the minimum value of J, denoted by 7 mm , when the transfer function C(z) 
of the equalizer is given by Equation 9.4-32. Since J = E\s k \ 2 = E (s k I k ) — E (e k Il), 
and since E{_e k lf) = 0 by virtue of the orthogonality conditions given in Equation 
9.4-27, it follows that 


J min = E(e k I£) 

oo 

= E\i k \ 2 - J2 c i E W-jn) 

7 =—oo 
oo 

= 1 - c jf-j 

j=-o o 


(9.4-34) 


This particular form for J mm is not very informative. More insight on the perfor- 
mance of the equalizer as a function of the channel characteristics is obtained when the 
summation in Equation 9.4-34 is transformed into the frequency domain. This can be 
accomplished by first noting that the summation in Equation 9.4-34 is the convolution 
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of { Cj ) with { f j } . evaluated at a shift of zero. Thus, if { bk ) denotes the convolution of 
these two sequences, the summation in Equation 9.4-34 is simply equal to bo- Since 
the z transform of the sequence {bk } is 


the term bo is 


B(z) = C(z)F(z) 

F(z)F*(l/z*) 

~ F(z)F*(l/z*) + N 0 
X(z) 

X(z) + N 0 


bo 


1 


2nj 

1 


2nj 


B(z ) 


dz 


X{z) 

z[X(z) + N 0 ] 


dz 


(9.4-35) 


(9.4-36) 


The contour integral in Equation 9.4—36 can be transformed into an equivalent line 
integral by the change of variable z = e' 0>T . The result of this change of variable is 


T r/ T X(e io)T ) 
b ° = 2 jt .L k/t X(eJ^) + No d ° J 


(9.4-37) 


Finally, substitution of the result in Equation 9.4—37 for the summation in Equation 
9.4-34 yields the desired expression for the minimum MSE in the form 


T r' T X(e jwT ) 

./min = 1 / d(D 

2i r J- n / T X{e^ T ) + No 
r /T No 
~ 2 tt L /T X(ei“ T ) + N 0 

_ [ n/T No dm 

_ 2n J —n/T T~'J2Z-oo\H(co + 2iTn/T)\ 2 + No 

In the absence of intersymbol interference, X(e 2a>T ) = 1 and, hence, 

, _ N o 

mm ~ 1 + No 


(9.4-38) 


(9.4-39) 


We observe that 0 < 7 mm < 1. Furthermore, the relationship between the output 
(normalized by the signal energy) SNR yoo and J mm must be 


Yoo = 


1 - F 


(9.4-40) 


More importantly, this relation between and 7 m in also holds when there is residual 
intersymbol interference in addition to the noise. 
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Finite -length equalizer Let us now turn our attention to the case in which the 
transversal equalizer spans a finite time duration. The output of the equalizer in the k\h 
signaling interval is 


K 

Ik = Y! c ) Vk ~i 

j=~K 


The MSE for the equalizer having 2 K + 1 taps, denoted by J{K), is 


J(K) = E\I k 


h\ 2 = E 


K 

h ~ Yh Cj Vk—j 

i=~K 


2 


(9.4-41) 


(9.4-42) 


Minimization of J(K ) with respect to the tap weights { cj } or, equivalently, forcing 
the error s k = I k — I k to be orthogonal to the signal samples v j-i’ 1*1 ^ K, yields the 
following set of simultaneous equations: 


where 


and 


K 

Y CjTi } = ^,, l = -1, 0,1 K 

j=~K 

f + N 0 8,j (| / - j | < L) 

(otherwise) 

(— L < l < 0) 
(otherwise) 



(9.4-43) 


(9.4-44) 


(9.4-45) 


It is convenient to express the set of linear equations in matrix form. Thus, 


rc = $ 


(9.4-46) 


where C denotes the column vector of 2 K + 1 tap weight coefficients, r denotes the 
(2 K + 1) x (2 K + 1) Hermitian covariance matrix with elements F, ; and £ is a (2 K + 1)- 
dimensional column vector with elements . The solution of Equation 9.4—46 is 

Copt = T-'S (9.4-47) 

Thus, the solution for C op t involves inverting the matrix r. The optimum tap weight 
coefficients given by Equation 9.4-47 minimize the performance index J(K), with the 
result that the minimum value of J{K) is 


o 

= i - Y, c jf~j 


j=~K 

= i - 


(9.4^18) 


where H represents the conjugate transpose. 7 mm ( K ) may be used in Equation 9.4^10 
to compute the output SNR for the linear equalizer with 2 K + 1 tap coefficients. 
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9.4-3 Performance Characteristics of the MSE Equalizer 


In this section, we consider the performance characteristics of the linear equalizer that 
is optimized by using the MSE criterion. Both the minimum MSE and the probability of 
error are considered as performance measures for some specific channels. We begin by 
evaluating the minimum MSE A mm and the output SNR Yoo for two specific channels. 
Then, we consider the evaluation of the probability of error. 

example 9.4-1. First, we consider an equivalent discrete-time channel model con- 
sisting of two components fo and /i, which are normalized to |/o| 2 + f/j | 2 = 1. 
Then 

F(z) = f 0 + hz~ l (9.4-49) 

and 

X(z) = foffz + 1 + fof)zr' (9.4-50) 

The corresponding frequency response is 

X(e 1(oT ) = fo ne jmT + 1 + fUxe-^ T 
= l + 2|/ o ||/i| CO s(a>r + 0) 


where 9 is the angle of fof* . We note that this channel characteristic possesses a null 
at to = tt/T when fo = f\ = 

A linear equalizer with an infinite number of taps, adjusted on the basis of the 
MSE criterion, will have the minimum MSE given by Equation 9.4-38. Evaluation of 
the integral in Equation 9.4-38 for the X(e Jr " T ) given in Equation 9.4-51 yields the 
result 


•An i n — 


N 0 




■ 2iVo(|/o] 2 + |/i| 2 ) + (|/ol 2 — l/tl 2 ) 2 
N 0 


(9.4-52) 


V Nq + 2Nq + (|/o| 2 — I/ll 2 ) 


2\2 


Let us consider the special case in which fo = fi = 

J m i n = No/y/ N q + 2Nq and the corresponding output SNR is 


A . The minimum MSE is 


Xoo — 



Nq « 1 


(9.4-53) 


This result should be compared with the output SNR of 1 /No obtained in the case of 
no intersymbol interference. A significant loss in SNR occurs from this channel. 


example 9.4-2. As a second example, we consider an exponentially decaying char- 
acteristic of the form 


fk = Vl -a 2 a k , 


k = 0,1,... 
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where a < 1 . The Fourier transform of this sequence is 
X(e<- r > = ' “ 


1 + a 1 — 2fl cos coT 


which is a function that contains a minimum at <a = n/T. 
The output SNR for this channel is 


Too — | \f 1 + 2iVo- — + A^o — 1 


N 0 « 1 


l — a 1 

~ (1 + a 2 )No ’ 

Therefore, the loss in SNR due to the presence of the interference is 

1 — a 2 ' 


-10 log 


10 


1 + a- 


(9.4-54) 


(9.4-55) 


Probability of error performance of linear MSE equalizer Above, we discussed 
the performance of the linear equalizer in terms of the minimum achievable MSE ,/ mm 
and the output SNR y that is related to J mm through the formula in Equation 9.4^40. 
Unfortunately, there is no simple relationship between these quantities and the prob- 
ability of error. The reason is that the linear MSE equalizer contains some residual 
intersymbol interference at its output. This situation is unlike that of the infinitely long 
zero-forcing equalizer, for which there is no residual interference, but only Gaussian 
noise. The residual interference at the output of the MSE equalizer is not well char- 
acterized as an additional Gaussian noise term, and, hence, the output SNR does not 
translate easily into an equivalent error probability. 

One approach to computing the error probability is a brute force method that yields 
an exact result. To illustrate this method, let us consider a PAM signal in which the 
information symbols are selected from the set of values 2n — M — 1 , n = 1 , 2, . . . , M, 
with equal probability. Now consider the decision on the symbol /„ . The estimate of I n 
is 

K 

In = q 0 In “f ^ ' Ikqn—k “b ^ ^ Cj^ln—j (9.4—56) 

¥» j=-K 

where { q „ } represent the convolution of the impulse response of the equalizer and 
equivalent channel, i.e., 

K 

q„ = ^2 c kfi-k (9.4-57) 

k=-K 

and the input signal to the equalizer is 

L 

7=0 


(9.4-58) 
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The first term in the right-hand side of Equation 9.4-56 is the desired symbol, the 
middle term is the intersymbol interference, and the last term is the Gaussian noise. 
The variance of the noise is 


K 

°n =U 0 J2 c 2 (9.4-59) 

j=~K 

For an equalizer with 2K + I taps and a channel response that spans L + 1 symbols, 
the number of symbols involved in the intersymbol interference is 2 K + L. 

Define 


D — ^ ^ lkQ_n—k 

k^n 


(9.4-60) 


For a particular sequence of 2 K + L information symbols, say the sequence I j, the 
intersymbol interference term T) = Dj is fixed. The probability of error for a fixed D j 
is 


P e (Dj) = + Dj> q 0 ) 


M 

2(M - 1) 
M 


Q 


I (go -Dj) 2 


(9.4-61) 


where N denotes the additive noise term. The average probability of error is obtained 
by averaging P e {Dj ) over all possible sequences I j. That is. 


P e =J2Pe(Dj)P(Ij) 

Iy 


2 (M - 1) 
M 




I (go -Dj) 2 


a- 


(9.4-62) 


P(Ij) 


When all the sequences are equally likely. 


P(Ij) = 


1 


M 2R+ L 


(9.4-63) 


The conditional error probability terms P e {D j ) are dominated by the sequence that 
yields the largest value of Dj. This occurs when /„ = ±(M — 1) and the signs of the 
information symbols match the signs of the corresponding { q n }. Then, 

D* = (M - i)J 2 \gk\ 

kjto 

and 


. 2(M — 1) 

P e (D*) = — -Q 


/ 

0 / 


% j 

\ 

a 2 


\ 


M - 1 


go 






(9.4-64) 
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Thus, an upper bound on the average probability of error for equally likely symbol 
sequences is 

Pe<Pe(D*j ) (9.4-65) 

If the computation of the exact error probability in Equation 9.4—62 proves to be 
too cumbersome and too time consuming because of the large number of terms in the 
sum and if the upper bound is too loose, one can resort to one of a number of different 
approximate methods that have been devised, which are known to yield tight bounds 
on P e . A discussion of these different approaches would take us too far afield. The 
interested reader is referred to the papers by Saltzberg (1968). Lugannani (1969), Ho 
and Yeh (1970), Shimbo and Celebiler (1971), Glave (1972), Yao (1972), and Yao and 
Tobin (1976). 

As an illustration of the performance limitations of a linear equalizer in the pres- 
ence of severe intersymbol interference, we show in Figure 9.4-4 the probability of 
error for binary (antipodal) signaling, as measured by Monte Carlo simulation, for 
the three discrete-time channel characteristics shown in Figure 9.4—5. For purposes of 
comparison, the performance obtained for a channel with no intersymbol interference 
is also illustrated in Figure 9.4-4. The equivalent discrete-time channel shown in Fig- 
ure 9.4— 5a is typical of the response of a good-quality telephone channel. In contrast, 
the equivalent discrete-time channel characteristics shown in Figure 9.4-5b and c result 



FIGURE 9.4-4 

Error rate performance of linear MSE equalizer. Thirty-one taps in transversal equalizer. 
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0.815 


0.72 




(b) 


(a) 


0.688 



FIGURE 9.4-5 

Three discrete-time channel characteristics. 


in severe intersymbol interference. The spectral characteristics \X(e ja> )\ for the three 
channels, illustrated in Figure 9.4-6, clearly show that the channel in Figure 9.4-5c has 
the worst spectral characteristic. Hence the performance of the linear equalizer for this 
channel is the poorest of the three cases. Next in performance is the channel shown in 
Figure 9.4-5b, and finally, the best performance is obtained with the channel shown in 
Fig. 9.4-5a. In fact, the error rate of the latter is within 3 dB of the error rate achieved 
with no interference. 

One conclusion reached from the results on output SNR /qq and the limited prob- 
ability of error results illustrated in Figure 9.4-4 is that a linear equalizer yields good 
performance on channels such as telephone lines, where the spectral characteristics of 
the channels are well behaved and do not exhibit spectral nulls. On the other hand, 
a linear equalizer is inadequate as a compensator for the intersymbol interference on 
channels with spectral nulls, which may be encountered in radio transmission. In gen- 
eral, the channel spectral nulls result in a large noise enhancement at the output of the 
linear equalizer. 

The basic limitation of the linear equalizer to cope with severe ISI has motivated 
a considerable amount of research into non-linear equalizers with low computational 
complexity. The decision-feedback equalizer described in Section 9.5 is shown to be 
an effective solution to this problem. 
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FIGURE 9.4-6 

Amplitude spectra for the channels shown in Figure 9.4-5a, b, and c, respectively. 


9.4-4 Fractionally Spaced Equalizers 

In the linear equalizer structures that we have described in the previous section, the 
equalizer taps are spaced at the reciprocal of the symbol rate, i.e., at the reciprocal of the 
signaling rate 1/ T. This tap spacing is optimum if the equalizer is preceded by a filter 
matched to the channel distorted transmitted pulse. When the channel characteristics 
are unknown, the receiver filter is sometimes matched to the transmitted signal pulse 
and the sampling time is optimized for this suboptimum filter. In general, this approach 
leads to an equalizer performance that is very sensitive to the choice of sampling time. 

The limitations of the symbol rate equalizer are most easily evident in the frequency 
domain. From Equation 9.2-5, the spectrum of the signal at the input to the equalizer 
may be expressed as 


YAf) = ^J2 x (f 

n ' 



e j2n(f~n/T)z 0 


(9.4-66) 
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where Yr(f) is the folded or aliased spectrum, where the folding frequency is 1/2 T. 
Note that the received signal spectrum is dependent on the choice of the sampling delay 
To. The signal spectrum at the output of the equalizer is Cj(.f)Y T (,f), where 

K 

CAf ) = E Cke- j2nfkT (9.4-67) 

k=-K 

It is clear from these relationships that the symbol rate equalizer can only compen- 
sate for the frequency-response characteristics of the aliased received signal. It cannot 
compensate for the channel distortion inherent in X ( f )e j2jTlr ". 

In contrast to the symbol rate equalizer, a fractionally spaced equalizer (FSE) is 
based on sampling the incoming signal at least as fast as the Nyquist rate. For example, 
if the transmitted signal consists of pulses having a raised cosine spectrum with a roll- 
off factor 0, its spectrum extends to F max = (1 + 0)/2T. This signal can be sampled 
at the receiver at a rate 

1 + 0 

2 F max = (9.4-68) 

and then passed through an equalizer with tap spacing of T /(I + 0). For example, if 
0 = 1, we would have a ^T-spaced equalizer. If 0 = 0.5, we would have a |F-spaced 
equalizer, and so forth. In general, then, a digitally implemented fractionally spaced 
equalizer has tap spacing of MT / N where M and N are integers and N > M. Usually, 
a ^F-spaced equalizer is used in many applications. 

Since the frequency response of the FSE is 

K 

CAf ) = E Cke- j2nfkT ' (9.4-69) 

k=—K 


where T' = MT / N , it follows that Cjff) can equalize the received signal spectrum 
beyond the Nyquist frequency / = 1/2F to / = (1 + 0)/2T = N/2MT. The 
equalized spectrum is 


Cr(f)YAf) = C r (/)E X “ fT K 


= c r (f)J2x{f- 


n 
T 
nN 

Wr 




J2n(f-nN/MT) r 0 


(9.4-70) 


Since X{f) = 0 for |/| > N /2MT , Equation 9.4-70 may be expressed as 

Cr(f)YAf) = C T '(f)X(f)e j27zfz \ |/| < ^ (9.4-71) 

Thus, we observe that the FSE compensates for the channel distortion in the received 
signal before the aliasing effects due to symbol rate sampling. In other words, Cjff) 
can compensate for an arbitrary timing phase. 

The FSE output is sampled at the symbol rate 1 /T and has the spectrum 

E C T' (f - f) X (/ - e^C-CTP, 

k ^ ' ' ' 


(9.4-72) 
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In effect, the optimum FSE is equivalent to the optimum linear receiver consisting of 
the matched filter followed by a symbol rate equalizer. 

Let us now consider the adjustment of the tap coefficients in the FSE. The input to 
the FSE may be expressed as 


/ kMT\ 


(kMT 


bd = ?Mnr-" r J +v brJ 


/ kMT\ 


(9.4-73) 


In each symbol interval, the FSE produces an output of the form 


K 

h = c »y 

n=—K 



nMT\ 


N 




(9.4-74) 


where the coefficients of the equalizer are selected to minimize the MSE. This 
optimization leads to a set of linear equations for the equalizer coefficients that have 
the solution 


C 0 pt = A- 1 a (9.4-75) 

where A is the covariance matrix of the input data and a is the vector of cross corre- 
lations. These equations are identical in form to those for the symbol rate equalizer, 
but there are some subtle differences. One is that A is Hermitian, but not Toeplitz. In 
addition, A exhibits periodicities that are inherent in a cyclostationary process, as shown 
by Qureshi (1985). As a result of the fractional spacing, some of the eigenvalues of 
A are nearly zero. Attempts have been made by Long et al. (1988a, b) to exploit this 
property in the coefficient adjustment. 

An analysis of the performance of fractionally spaced equalizers, including their 
convergence properties, is given in a paper by Ungerboeck (1976). Simulation results 
demonstrating the effectiveness of the FSE over a symbol rate equalizer have also 
been given in the papers by Qureshi and Forney (1977) and Gitlin and Weinstein 
(1981). We cite two examples from these papers. First. Figure 9.4-7 illustrates the 
performance of the symbol rate equalizer and a \ T -FSE for a channel with high-end 
amplitude distortion, whose characteristics are also shown in this figure. The symbol- 
spaced equalizer was preceded with a filter matched to the transmitted pulse that had a 
(square-root) raised cosine spectrum with a 20 percent roll-off (/l = 0.2). The FSE did 
not have any filter preceding it. The symbol rate was 2400 symbols/s and the modulation 
was QAM. The received SNR was 30 dB. Both equalizers had 31 taps; hence, the 
T-FSE spanned one-half of the time interval of the symbol rate equalizer. Neverthe- 
less, the FSE outperformed the symbol rate equalizer when the latter was optimized at 
the best sampling time. Furthermore, the FSE did not exhibit any sensitivity to timing 
phase, as illustrated in Figure 9.4-7b. 

Similar results were obtained by Gitlin and Weinstein. For a channel with poor 
envelope delay characteristics, the SNR performance of the symbol rate equalizer and 
a ^ 7 -FSE are illustrated in Figure 9.4-8. In this case, both equalizers had the same 
time span. The T -spaced equalizer had 24 taps while the FSE had 48 taps. The symbol 
rate was 2400 symbols/s and the data rate was 9600 bits/s with 16-QAM modulation. 
The signal pulse had a raised cosine spectrum with /3 = 0.12. Note again that the FSE 
outperformed the T -spaced equalizer by several decibels, even when the latter was 
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(a) Channel with high-end amplitude distortion (HA) 


(b) Equalizer performance 


FIGURE 9.4-7 

T and j T equalizer performance as a function of timing phase for 2400 symbols per second. 
(NRF indicates no receiver filter.) [ From Qureshi and Forney (1977). © 1977 IEEE .] 


adjusted for optimum sampling. The results in these two papers clearly demonstrate 
the superior performance achieved with a fractionally spaced equalizer. 


9.4-5 Baseband and Passband Linear Equalizers 

The linear equalizer treated above was described in terms of equivalent lowpass signals. 
However, in a practical implementation, the linear equalizer shown in Figure 9.4-1 can 
be realized either at baseband or at bandpass. For example, Figure 9.4-9 illustrates the 
demodulation of QAM or multiphase PSK by first translating the signal to baseband and 
equalizing the baseband signal with an equalizer having complex- valued coefficients. In 
effect, the equalizer with a complex-valued (in-phase and quadrature components) input 



FIGURE 9.4-8 

Performance of T and I T equalizers as a function 
of timing phase for 2400 symbols/s 16-QAM on a 
channel with poor envelope delay. [From Gitlin and 
Weinstein (1981). Reprinted with permission from 
Bell System Technical Journal. © 1981 AT & T. \ 
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-sin o) c t Output 


FIGURE 9.4-9 

QAM and PSK signal demodulator with baseband equalizer. 


is equivalent to four parallel equalizers with real-valued tap coefficients as shown in 
Figure 9.4-10. We generally refer to the equalizer in Figure 9.4-9 as a complex- valued 
baseband equalizer. 

As an alternative, we may equalize the signal at passband. This is accomplished 
as shown in Figure 9.4-1 1 for two-dimensional signal constellations such as QAM 
and PSK. The received signal is filtered and, in parallel, it is passed through a Hilbert 
transformer, called a phase-splitting filter. Thus, we have the equivalent of in-phase and 
quadrature components at passband, which are fed to a passband complex equalizer. 
We may call this equalizer structure a complex-valued passband equalizer. Following 
the equalization, the signal is down-converted to baseband and detected. 

The complex-valued baseband equalizer may be implemented either as a symbol 
rate equalizer (SRE) or as a fractionally spaced equalizer (FSE), with the latter being 
preferable in view of its insensitivity to the sampling phase within a symbol interval. 

The complex-valued passband equalizer must be an FSE, with samples of the 
received signal taken at some multiple of the symbol rate that exceeds the Nyquist 
rate. 

An alternative passband FSE to the structure shown in Figure 9.4-1 1 is illustrated 
in Figure 9.4-12. In this FSE, real-valued samples of the received signal are taken 
at the Nyquist rate or faster and equalized at bandpass by a linear equalizer that has 
complex-valued coefficients. We note that this equalizer structure does not explicitly 



FIGURE 9.4-10 

Complex-valued baseband equalizer for 
QAM and PSK signals. 
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FIGURE 9.4-11 

QAM or PSK signal equalization at passband. 


implement a Hilbert transformer to perform phase splitting. Instead, the phase-splitting 
function is embedded in the equalizer coefficients and, thus, the Hilbert transform is 
avoided. This alternative passband FSE structure in Figure 9.4-12 has been called a 
phase- splitting FSE (PS-FSE). Its properties and its performance has been investigated 
by Mueller and Werner (1982), Im and Un (1987), and Fing and Qureshi (1990). 


Complex quantities 


Real quantities 



FIGURE 9.4-12 

Structure of a phase-splitting fractionally spaced equalizer. [From Ling and Qureshi (1990); 
© 1990 IEEE.] 
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■ 9.5 

DECISION-FEEDBACK EQUALIZATION 

In Section 9.3-2 we developed an equivalent discrete-time model of the channel with ISI 
and additive noise, as shown in Figure 9.3-2. We observed that the additive Gaussian 
noise in this model is colored. Then we simplilied this model by inserting a noise- 
whitening filter prior to the equalizer, so that the resulting discrete-time model of the 
channel has AWGN as shown in Figure 9.3-3. To recover the information sequence that 
is corrupted by ISI, we considered two types of equalization methods, one based on the 
MLSE criterion that is efficiently implemented by the Viterbi algorithm and the other 
employed a linear transversal filter. We recall that the MLSE is the optimum detector in 
the sense that it minimizes the probability of a sequence error while the linear equalizer 
is suboptimum. 

In this section, we consider a nonlinear type of channel equalizer for mitigat- 
ing the ISI, which is also suboptimum, but whose performance is generally better 
than that of the linear equalizer. The nonlinear equalizer consists of two filters, a 
feedforward filter and a feedback filter, arranged as shown in Figure 9.5-1, and it is 
called a decision-feedback equalizer (DFE). The input to the feedforward filter is the 
received signal sequence. The feedback filter has as its input the sequence of decisions 
on previously detected symbols. Functionally, the feedback filter is used to remove 
that part of the ISI from the present estimated symbol caused by previously detected 
symbols. Since the detector feeds hard decisions to the feedback filter, the DFE is 
nonlinear. 

In the case where the feedforward and feedback filters have infinite-duration 
impulse responses, Price (1972) showed that the optimum feedforward filter in a zero- 
forcing DFE is the noise- whitening filter with system function l/F*(l/z*). Hence, in 
the zero-forcing DFE, the feedforward filter whitens the additive noise and results in 
an equivalent discrete-time channel having the system function F(z)- 

In our treatment, we focus on finite-duration impulse response filters and apply the 
MSE criterion to optimize their coefficients. 



FIGURE 9.5-1 

Structure of decision-feedback equalizer. 
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9.5-1 Coefficient Optimization 

From the description given above, it follows that the equalizer output can be 
expressed as 


o k 2 

Ik = c j V k-j + ^2 c jlk-j (9.5-1) 

j=-Ki j = 1 

where I k is an estimate of the At It information symbol, j cj } are the tap coefficients 
of the filter, and {I k _ i, . . . , I k -K 2 } are previously detected symbols. The equalizer is 
assumed to have (Ad + 1) taps in its feedforward section and Ad in its feedback section. 

Both the peak distortion criterion and the MSE criterion result in a mathematically 
tractable optimization of the equalizer coefficients, as can be concluded from the papers 
by George et al. (1971), Price (1972), Salz (1973), and Proakis (1975). Since the MSE 
criterion is more prevalent in practice, we focus our attention on it. Based on the 
assumption that previously detected symbols in the feedback filter are correct, the 
minimization of MSE 

J(K U Ad) = E\I k - I k \ 2 (9.5-2) 

leads to the following set of linear equations for the coefficients of the feedforward 
filter: 

o 

Y J ^ijCj = f* l , l = ~K\, ...,—1,0 (9.5-3) 

j=~Ki 

where 

-i 

fij = E fm fm+i—j + Afofy , IJ = -K { 1 , 0 (9.5-4) 

m = 0 

The coefficients of the feedback filter of the equalizer are given in terms of the coeffi- 
cients of the feedforward section by the following expression: 

o 

c k = - Y, c jfk-j> k=l,2,...,K 2 (9.5-5) 

j=-Ki 

The values of the feedback coefficients result in complete elimination of intersymbol 
interference from previously detected symbols, provided that previous decisions are 
correct and that Ad > L (see Problem 9.51). 


9.5-2 Performance Characteristics of DFE 

We now turn our attention to the performance achieved with decision-feedback equal- 
ization. The exact evaluation of the performance is complicated to some extent by 
occasional incorrect decisions made by the detector, which then propagate down the 
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feedback section. In the absence of decision errors, the minimum MSE is given as 

o 

^min(A'i) = 1 ~ J2 C jf-J ( 9 -5-6) 

j=-K, 


By going to the limit (K i -> oo) of an infinite number of taps in the feedforward filter, 
we obtain the smallest achievable MSE, denoted as 7 mm . With some effort 7 mm can be 
expressed in terms of the spectral characteristics of the channel and additive noise, as 
shown by Salz (1973). This more desirable form for J, mn is 


r r^/T 

i min = exp \ — / In 
. 27T J^ n/T 


No 


X(ej°> T ) + N 0 _ 


dco 


The corresponding output SNR is 


1 - Jn 


Yoo = 


= -' +exp {l? 


r* IT 


In 


-7T/T 


N 0 + X(eJ” J ) 
No 


dco 


(9.5-7) 


(9.5-8) 


We observe again, that in the absence of intersymbol interference, X(e J " >T ) = 1, 
and hence, J m \ n = Nq/{\ + No)- The corresponding output SNR is y^ = I /No. 


example 9.5-1. It is interesting to compare the value of J mm for the decision-feedback 
equalizer with the value of J rnm obtained with the linear MSE equalizer. For example, 
let us consider the discrete-time equivalent channel consisting of two taps /o and f\ . 
The minimum MSE for this channel is 


7 min — exp 



No 

l + iVo + 2|/o||/i|cos(«r + 0) 



= iVoexp 


r— [ ln(l + N 0 + 2|/o||/i| cosw)dco 
27 r J_ n 


(9.5-9) 


2No 

1 + No + \/(l + No) 2 — 4|/ 0 /i | 2 


Note that 7 m i n is maximized when |/o| = |/i| = 


f . Then 


_ 2No 

1 + N 0 + ^(1 + Nq) 2 - 1 
« 2N 0 , No « 1 


(9.5-10) 


The corresponding output SNR is 

Mi « 1 (9.5-11) 

ZJMo 

Therefore, there is a 3-dB degradation in output SNR due to the presence of intersymbol 
interference. In comparison, the performance loss for the linear equalizer is very severe. 
Its output SNR as given by Equalizer 9.4-53 is ~ (2/ No) 1 / 2 for /Vo <<C I . 
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example 9.5-2. Consider the exponentially decaying channel characteristic of the 
form 


f k = (1 - a 2 ) 1/2 a k , k = 0,1,2,... (9.5-12) 

where a < 1 . The output SNR of the decision-feedback equalizer is 

1 + a 2 + (1 — a 2 )/ Nq — 2 a cos &>’ 


Koo = - 1 + exp | — 


hi: 


In 


1 + a 2 — 2a cos u> 


da> 


= - 1 + 2^- 1 1 - « 2 + M)(l + a 2 ) + \j[\ - a 2 + N 0 (l + a 2 )] 2 - 4a 2 A 0 2 J 


(1 - a 2 )[ 1 + JV 0 (1 + a 2 )/(l - a 2 )] - N 0 


1 — a 2 
No 


No 

No « 1 


(9.5-13) 


Thus, the loss in SNR is 10 logio(l — a 2 ) dB. In comparison, the linear equalizer has 

a loss of 10 logi 0 [(l — a 2 )/( 1 + a 2 )] dB. 

These results illustrate the superiority of the decision-feedback equalizer over the 
linear equalizer when the effect of decision errors on performance is neglected. It 
is apparent that a considerable gain in performance can be achieved relative to the 
linear equalizer by the inclusion of the decision-feedback section, which eliminates the 
intersymbol interference from previously detected symbols. 

One method of assessing the effect of decision errors on the error rate performance 
of the decision-feedback equalizer is Monte Carlo simulation on a digital computer. 
For purposes of illustration, we offer the following results for binary PAM signaling 
through the equivalent discrete-time channel models shown in Figure 9.4-5b and c. 

The results of the simulation are displayed in Figure 9.5-2. First of all, a compar- 
ison of these results with those presented in Figure 9.4—4 leads us to conclude that the 
decision-feedback equalizer yields a significant improvement in performance relative to 
the linear equalizer having the same number of taps. Second, these results indicate that 
there is still a significant degradation in performance of the decision-feedback equal- 
izer due to the residual intersymbol interference, especially on channels with severe 
distortion such as the one shown in Figure 9.4-5c. Finally, the performance loss due 
to incorrect decisions being fed back is 2 dB, approximately, for the channel responses 
under consideration. Additional results on the probability of error for a decision- 
feedback equalizer with error propagation may be found in the papers by Duttweiler 
et al. (1974) and Beaulieu (1994). 

The structure of the DFE that is analyzed above employs a T -spaced filter for the 
feedforward section. The optimality of such a structure is based on the assumption that 
the analog filter preceding the DFE is matched to the channel-corrupted pulse response 
and its output is sampled at the optimum time instant. In practice, the channel response 
is not known a priori, so it is not possible to design an ideal matched filter. In view 
of this difficulty, it is customary in practical applications to use a fractionally spaced 
feedforward filter. Of course, the feedback filter tap spacing remains at /'. The use of 
the FSE for the feedforward filter eliminates the system sensitivity to a timing error. 
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FIGURE 9.5-2 

Performance of decision-feedback equalizer with and without error propagation. 

Performance comparison with the MLSE We conclude this subsection on the 
performance of the DFE by comparing its performance against that of the MLSE. For 

the two-path channel with f 0 = f 1 = we have shown that the MLSE suffers no 
SNR loss while the decision-feedback equalizer suffers a 3-dB loss. On channels with 
more distortion, the SNR advantage of the MLSE over decision-feedback equalization 
is even greater. Figure 9.5-3 illustrates a comparison of the error rate performance 
of these two equalization techniques, obtained via Monte Carlo simulation, for binary 
PAM and the channel characteristics shown in Figure 9.4-5b and c. The error rate curves 
for the two methods have different slopes; hence the difference in SNR increases as 
the error probability decreases. As a benchmark, the error rate for the AWGN channel 
with no intersymbol interference is also shown in Figure 9.5-3. 


9.5-3 Predictive Decision-Feedback Equalizer 

Belfiore and Park (1979) proposed another DFE structure that is equivalent to the one 
shown in Figure 9.5-1 under the condition that the feedforward filter has an infinite 
number of taps. This structure consists of an FSE as a feedforward filter and a linear 
predictor as a feedback lilter, as shown in the configuration given in Figure 9.5-4. Let 
us briefly consider the performance characteristics of this equalizer, based on the MSE 
criterion. 


Probability of error 
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FIGURE 9.5-3 

Comparison of performance between MLSE and decision-feedback equalization for channel 
characteristics shown (a) in Figure 9.4-5b and (b) in Figure 9.4-5c. 


First of all, the noise at the output of the infinite length feedforward filter has the 
power spectral density 


N 0 X(e^ T ) ik 77 

\Np + X(ei" T )\ 2 ' “ T 

The residual intersymbol interference has the power spectral density 

, _ X(e ja>T ) 2 = Nl < jt 

N 0 + X(ei“ T ) \Np + X(td “ r )| 2 ’ “ T 


(9.5-14) 


(9.5-15) 


Desired symbol Desired symbol 



Output 

decision 


FIGURE 9.5-4 

Block diagram of predictive DFE. 
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The sum of these two spectra represents the power spectral density of the total noise 
and intersymbol interference at the output of the feedforward filter. Thus, on adding 
Equations 9.5-14 and 9.5-15, we obtain 


\E r (co)\ 2 


Np 

N 0 + X(t d“ r ) ’ 



(9.5-16) 


As we have observed previously, if X(e jr ' ,T ) = 1, the channel is ideal and, hence, 
it is not possible to reduce the MSE any further. On the other hand, if there is channel 
distortion, the power in the error sequence at the output of the feedforward filter can be 
reduced by means of linear prediction based on past values of the error sequence. 

If B(co) represents the frequency response of the infinite length feedback pre- 
dictor, i.e., 


OO 

B(oj) = b n e~ jmnT (9.5-17) 

n= 1 

then the error at the output of the predictor is 

E p (co) = £,(<u) - E,{m)B(cS) = £,(<y)[ 1 - £(«)] (9.5-18) 

The minimization of the mean square value of this error, i.e., 

J = r II - B{co)\ 2 \E t {(o)\ 2 d(o (9.5-19) 

In J-x/t 

over the predictor coefficients {b „ } yields the optimum predictor in the form 

„ G(co) 

B(co) = 1 — (9.5-20) 

§o 

where G(a>) is the solution to the spectral factorization 

G(co)G*(-co) = — r (9.5-21) 

\E,(co)\ 2 

and 

OO 

G(co) = Y / g n e- jmnT (9.5-22) 

n = 0 

The output of the infinite length linear predictor is a white noise sequence with power 
spectral density 1 / and the corresponding minimum MSE is given by Equation 9.5-7. 
Therefore, the MSE performance of the infinite length predictive DFE is identical to 
the conventional DFE. 

Although these two DFE structures result in equivalent performance if their lengths 
are infinite, the predictive DFE is suboptimum if the lengths of the two filters are 
finite. The reason for the optimality of the conventional DFE is relatively simple. 
The optimization of its tap coefficients in the feedforward and feedback filters is 
done jointly. Hence, it yields the minimum MSE. On the other hand, the optimiza- 
tions of the feedforward filter and the feedback predictor in the predictive DFE are 
done separately. Hence, its MSE is at least as large as that of the conventional DFE. 
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In spite of this suboptimality of the predictive DFE, it is suitable as an equalizer for 
trellis-coded signals, where the conventional DFE is not as suitable, as described in the 
next chapter. 


9.5^1 Equalization at the Transmitter — Tomlinson-Harashima Precoding 

If the channel response is known to the transmitter, the equalizer can be placed at 
the transmitter end of the communication system. Thus, the noise enhancement that 
is generally inherent when the equalizer (linear or DFE) is placed at the receiver is 
avoided. In practice, however, channel characteristics generally vary with time, so it is 
cumbersome to place the entire equalizer at the transmitter. 

In wireline channels, the channel characteristics do not vary significantly with time. 
Therefore, it is possible to place the feedback filter of the DFE at the transmitter and 
the feedforward filter at the receiver. This approach has the advantage that the problem 
of error propagation due to incorrect decisions in the feedback filter is completely 
eliminated. Thus, the tail (postcursors) in the channel response is cancelled without 
any penalty in the SNR. The linear fractionally spaced feedforward part of the DFE, 
which ideally is the WMF, can be designed to compensate for ISI that results from any 
small time variation in the channel response. The synthesis of the feedback filter of the 
DFE at the transmitter side is usually performed after the response of the channel is 
measured at the receiver by the transmission of a channel probe signal and the receiver 
sends to the transmitter the coefficients of the feedback filter. 

The one problem with this approach to implementing the DFE is that the signal 
points at the transmitter, after subtracting the postcursors of the ISI, generally have a 
larger dynamic range than the original signal constellation and, consequently, require 
a larger transmitter power. This problem can be avoided by precoding the information 
symbols prior to transmission as described by Tomlinson (1971) and Harashima and 
Miyakawa (1972). 

We describe the precoding technique for a PAM signal constellation. Since a square 
QAM signal constellation may be viewed as two PAM signal sets on quadrature carriers, 
the precoding is easily extended to QAM. For simplicity, we assume that the feedforward 
filter in the DFE is the WMF and that the channel response, characterized by the 
parameters {/, , 0 < i < L], is perfectly known to the transmitter and the receiver. The 
information symbols {/<.} are assumed to take the values {±1, ±3, . . . , ±(M — 1)}. 

In the precoding, the ISI due to the postcursors {/), 1 < i < L } is subtracted from 
the symbol to be transmitted and, if the difference falls outside of the range (—A/, M], 
it is reduced to the range by subtracting an integer multiple of 2 M from this difference. 
Hence, the precoder output may be expressed as 

L 

a k = fj a k-j + 2Mb k (9.5-23) 

j = i 

where { bk } represents the appropriate integer that brings {aQ to the desired range. In 
other words, {cik } is reduced to the desired range by performing a modulo-2 M operation. 

The modulo operation is defined mathematically by the function 


m y (x) = x — yz 
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Precoder Channel Detector/decoder 


FIGURE 9.5-5 

Tomlinson-Harashima precoding. 


where y > 0 and z = 


x + y/2 


is a unique integer such that m y (x) e[— y/2, y/2]. In 


our case y = 2 M. By using the z transform to describe the operation of the precoder, 
we have 


A(z) = I(z ) - [F(z) - l]A(z) + 2 MB(z) (9.5-24) 


where the channel coefficient /o is normalized to unity for convenience. Hence, the 
transmitted sequence is 


A(z) = 


I(z) + 2 MB(z) 
F(z) 


(9.5-25) 


Since the channel response is F(z), the received signal sequence may be expressed as 


V(z) = A(z) + W(z) 

= [/( Z ) + 2 MB(z)] + W(z) 


(9.5-26) 


where W(z) represents the AWGN term. Therefore, the received data sequence term 
I(z) + 2 MB(z) at the input to the detector is free of ISI and l(z) can be recovered from 
V(z) by use of a symbol-by-symbol detector that decodes the symbols modulo-2M. 
Figure 9.5-5 illustrates the block diagram of the system that implements the precoder 
and the feedback filter of the DFE at the transmitter. 

The placement of the feedback filter at the transmitter makes it possible to use 
the DFE in conjunction with trellis-coded modulation (TCM). Since the equalizer at 
the receiver is a linear filter, decisions from the output of the Viterbi (TCM) detector 
can be used to adjust the coefficients of the equalizer. In this case, the Viterbi detector 
performs the modulo-2 M operations in its metric computations. 


■ 9.6 

REDUCED COMPLEXITY ML DETECTORS 

The performance results of the three basic equalization methods described above, 
namely, MLSE. linear equalization (LE), and decision-feedback equalization (DFE), 
clearly show the superiority of MLSE in channels with severe ISI. Such channels are en- 
countered in wireless communications and in high-density magnetic recording systems. 
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The performance advantage of MLSE has motivated a significant amount of research 
on methods that retain the performance characteristics of MLSE, but do so at a reduced 
complexity. 

The early work on the design of reduced complexity MLSE focused on methods 
that reduce the length of the ISI span by preprocessing the received signal prior to the 
maximum-likelihood detector. Falconer and Magee (1973) and Beare (1978) used a 
linear equalizer to reduce the span of the ISI to some small specified length prior to 
the Viterbi detector. Lee and Hill (1977) employed a DFE in place of the LE. Thus, the 
large ISI span in the channel is reduced to a sufficiently small length, called the desired 
impulse response, so that the complexity of the Viterbi detector following the LE or 
DFE is manageable. We may view this role of the LE or the DFE, prior to the Viterbi 
detector, as equalizing the channel response to a specified partial-response characteristic 
of short duration (the desired impulse response) which the Viterbi detector can handle 
with sufficiently small complexity. The choice of the desired impulse response is tailored 
to the ISI characteristics of the channel. This approach to reducing the complexity of 
the Viterbi detector has proved to be very effective in high-density magnetic recording 
systems, as illustrated in the papers by Siegel and Wolf (1991), Tyner and Proakis 
(1993), Moon and Carley (1988), and Proakis (1998). 

Another general approach is to reduce the complexity of the Viterbi detector di- 
rectly, by reducing the number of surviving sequences. The papers by Vermuelen and 
Heilman (1974), Fredricsson (1974), and Foschini (1977) describe algorithms that re- 
duce the number of surviving sequences in the Viterbi detector. Other works on this class 
of methods include the papers by Clark et al. (1984, 1985) and Wesolowski (1987a). 

The most effective approach in terms of performance for reducing the complexity 
of the Viterbi detector directly is the method described in the papers by Bergmans 
et al. (1987), Eyuboglu and Qureshi (1988), and Duel-Hallen and Heegard (1989). The 
filter preceding the Viterbi detector is the whitened matched filter (WMF) described 
previously. The WMF reduces the channel to one that has a minimum phase charac- 
teristic. The basic algorithm described in these papers for reducing the computational 
complexity of the Viterbi detector employs decision feedback within the Viterbi detec- 
tor to reduce the effective length of the ISI from L terms to L 0 terms, where Lq < L. 
This may be accomplished in one of two ways, as described by Bergmans et al. (1987), 
either by using “global feedback” or “local feedback” from preliminary decisions that 
are present in the Viterbi detector. The use of global feedback is illustrated in Fig- 
ure 9.6-1, where preliminary decisions obtained by using the most probable surviving 
sequence from the Viterbi detector are used to synthesize the tail in the ISI due to the 
channel coefficients (/l 0 +i, / l 0 + 2 , . . . , /l- t, fi)- Thus, for M-ary modulations, the 
computational complexity of the Viterbi detector is reduced from M L to M L °, which 
amounts to a reduction by the factor M L ~ L °. The primary drawback of using global 
feedback is that if one or more of the symbols . . . , I^-l in the most probable 

surviving sequence are incorrect, the subtraction of the tail in the ISI is also incorrect 
and, thus, the metric computations are corrupted by the residual ISI resulting from this 
imperfect cancellation. 

To remedy this problem, one may use the preliminary decisions corresponding 
to each surviving sequence to cancel the ISI in the tail of the corresponding surviv- 
ing sequence. Thus, the ISI will be perfectly cancelled when the correct sequence is 
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(a) Block diagram of symbol detector 
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be eliminated by feedback filter 


(b) Channel response 


FIGURE 9.6-1 

Reduced complexity ML sequence detector using feedback from the Viterbi detector. 


among the surviving sequences, even if it is not the most probable sequence. Bergmans 
et al. (1987) described this approach as using “local feedback” to perform the tail 
cancellation. 

It is interesting to note that if Lq is selected as unity (Lo =1), the Viterbi detector 
reduces to the simple feedback filter of a conventional DFE. At the other extreme, when 
L 0 = L, we have a full complexity Viterbi detector. The analytical and simulation results 
given in the paper by Bergmans et al. (1987) clearly illustrate that local feedback gives 
superior performance to global feedback. 


■ 9.7 

ITERATIVE EQUALIZATION AND DECODING— TURBO EQUALIZATION 

Iterative decoding and the turbo-coding principle that was described in Section 8.7 can 
be applied to channel equalization. Suppose the transmitter of a digital communica- 
tion system employs a binary systematic convolutional encoder followed by a block 
interleaver and a modulator. The channel is a linear time-dispersive channel that intro- 
duces ISI. In such a case, we may view the channel as an inner encoder in a serially 
concatenated code. Hence, we can apply iterative decoding based on the MAP criterion. 

The basic configuration of the iterative equalizer-decoder is shown in Fig- 
ure 9.7-1. The input to the MAP equalizer is the sequence { i.y } from the WMF. The 
equalizer computes the logarithm of the likelihood ratio of the coded bits, denoted as 
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FIGURE 9.7-1 

Iterative equalization and decoding. 


L E (x), which represents the a posteriori values of the coded bits. The outer decoder 
receives as an input the extrinsic part of L E (x), which is defined as 

L e (x) = L e (x ) - Lf(x) (9.7-1) 

where L®(x) is the extrinsic part of the outer decoder output after interleaving. L E (x) 
is deinterleaved prior to being fed to the outer decoder. 

The outer decoder computes the logarithm of the likelihood ratio of the coded bits, 
denoted by L D (x') and the information bits, denoted as L D (I). The extrinsic part of 
L D (x r ), denoted as L®(x'), is the incremental information about the current bit obtained 
by the decoder after observing all the information for all the received bits. The extrinsic 
information is computed as 

L?(x') = L D (x') - L e {x') (9.7-2) 

L E (x') is interleaved to produce L E (x) and fed to the MAP equalizer. We emphasize 
the importance of feeding back only the extrinsic part L®(x), thus, minimizing the 
correlation between the a priori information used by the equalizer and previous equal- 
izer outputs. Similarly, we reduce the a posteriori information L E (x) by the a priori 
information values L®(x) to obtain the extrinsic information value L E (x), which is fed 
to the outer decoder after deinterleaving. 

The computation of the log-likelihood ratios is described in the paper by Bauch 
et al. (1997). The power of this iterative equalization-decoding scheme can be assessed 
from the performance results given in this paper. Figure 9.7-2 illustrates the bit error 
probability obtained through simulation of the five-tap time-invariant channel given in 
Figure 9.4-5c. The outer decoder used is a rate 1 /2 recursive systematic convolutional 
code with constraint length K = 5. The interleaver used was a pseudorandom block 
interleaver of length N = 4096 bits. Binary PSK was used for modulation. The graph 
illustrates the performance gain as the number of iterations is increased. We observe 
that after six iterations, the performance of the iterative equalizer-decoder is within 
0.8 dB of the performance of the encoded data without ISI, at a bit error probability 
of 10~ 4 . ffence, the iterative equalizer eliminates nearly the entire loss due to ISI. In 
contrast, the optimum (noniterative) Viterbi detector for this channel suffers a loss of 
approximately 7 dB, due to ISI, as can be observed from Figure 9.5-3b. Therefore, 
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y b ~ SNR/bit in dB 


FIGURE 9.7-2 

Channel taps and bit error rate for a time-invariant channel. [From Bauch et al. (1997).] 


the iterative equalizer has achieved a performance gain of about 6 dB, aside from the 
coding gain due to the convolutional code. The performance of this method of iterative 
equalization has been evaluated for cellular radio channels by Bauch et al. (1998). An 
implementation of iterative equalization-decoding using non-linear circuits is described 
in a paper by Hagenauer et al. (1999). 

An alternative approach to iterative equalization-decoding is to employ a parallel 
concatenated code (turbo code) followed by a block interleaver and a modulator at the 
transmitter side. The receiver employs a MAP equalizer followed by a turbo decoder. 
The extrinsic information generated by the turbo decoder is fed back to the MAP 
equalizer. Thus, we have an iterative equalizer-turbo decoder structure, which is called 
a turbo equalizer. Turbo equalization is treated by Raphaeli and Zarai (1998) and 
Douillard et al. (1995). 


■ 9.8 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

The pioneering work on signal design for bandwidth-constrained channels was done 
by Nyquist (1928). The use of binary partial-response signals was originally pro- 
posed by Lender (1963) and was later generalized by Kretzmer (1966). Other early 
work on problems dealing with intersymbol interference (ISI) and transmitter and re- 
ceiver optimization with constraints on ISI was done by Gerst and Diamond (1961), 
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Tufts (1965), Smith (1965), and Berger and Tufts (1967). “Faster than Nyquist” trans- 
mission has been studied by Mazo (1975) and Foschini (1984). 

Channel equalization for digital communications was developed by Lucky (1965, 
1966), who focused on linear equalizers that were optimized using the peak distortion 
criterion. The mean-square-error criterion for optimization of the equalizer coefficients 
was proposed by Widrow (1966). 

Decision-feedback equalization was proposed and analyzed by Austin (1967). 
Analyses of the performance of the DFE can be found in the papers by Monsen (1971), 
George et al. (1971), Price (1972), Salz (1973), Duttweiler et al. (1974), and Altekar 
and Beaulieu (1993). 

The use of the Viterbi algorithm as the optimal maximum-likelihood sequence 
estimator for symbols corrupted by ISI was proposed and analyzed by Forney (1972) 
and Omura ( 197 1 ). Its use for carrier-modulated signals was considered by Ungerboeck 
(1974) and MacKenchnie (1973). 

The use of iterative MAP algorithms in suppressing ISI in coded systems, called 
turbo equalization, represents a major new advance in suppression of intersymbol 
interference in signal transmission through band-limited channels. It is anticipated 
that iterative MAP equalization algorithms will be incorporated in future communi- 
cation systems. The implementation of turbo equalization, described in the paper by 
Hagenauer et al. (1999), is the first attempt at implementing an iterative MAP equal- 
ization algorithm in a coded system. 


PROBLEMS 

9.1 A channel is said to be distortionless if the response y(t) to an input x(t) is Kx(t — to), 
where K and to are contants. Show that if the frequency response of the channel is 
A{f)ei° { f\ where A(/) and 9(f) are real, the necessary and sufficient conditions for 
distortionless transmission are A(f) = K and 9(f) = 2nfto ± nn, n = 0, 1,2,.... 

9.2 The raised cosine spectral characteristic is given by Equation 9.2-26. 

a. Show that the corresponding impulse response is 


b. Determine the Hilbert transform of x(t) when f = 1. 

c. Does x(t) possess the desirable properties of x(t) that make it appropriate for data 
transmission? Explain. 

d. Determine the envelope of the SSB suppressed-carrier signal generated from x(t). 

9.3 a. Show that (Poisson sum formula) 


s'm(jtt/T) cos (fint/T) 
nt/T 1 — 4f 2 t 2 /T 2 
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Hint: Make a Fourier-series expansion of the periodic factor 

OO 

Y h(t-kT) 

k=—o o 

b. Using the result in (a), verify the following versions of the Poisson sum: 

OO OO 

£ *(* r > = T £ H (f) « 

k =— oo n=—o o 

oi) 

k=—o o n=—o o ' ^ 

J oo 

£ h(kT)exp(-j2nkTf) = - Y H (/ “ y) (iii) 

k=—oo n=—o o 

c. Derive the condition for no intersymbol interference (Nyquist criterion) by using the 
Poisson sum formula. 

9.4 Suppose a digital communication system employs Gaussian-shaped pulses of the form 

x(t) = exp(— jta 2 t 2 ) 

To reduce the level of intersymbol interference to a relatively small amount, we impose 
the condition that x(T) = 0.01, where T is the symbol interval. The bandwidth W of the 
pulse x(t) is defined as that value of W for which X{W)/X{ 0) = 0.01, where X(f) is 
the Fourier transform of x(t). Determine the value of W and compare this value to that of 
raised cosine spectrum with 100 percent rolloff. 

9.5 Show that the impulse response of a filter having a square-root raised cosine spectral 
characteristic is given as 


(4/3f/7’)cos[7r(l + /3)t/T] + sin[7r(l — P)t/T] 

Xsr{t ’ ~ (nt/T)[l-(4pt/T) 2 ] 

9.6 It is desired to implement a (discrete-time) finite impulse response (FIR) filter that provides 
square-root raised cosine spectral shaping. The coefficients of the FIR filter are the sampled 
values of the time response given in Problem 9.5, where the samples are taken at t = kT /2, 
for & = 0, ±1, ±2, • • • , ±N. 

a. Determine the effect on the spectral characteristic resulting from the truncation of the 
filter response for IV = 10, 15, and 20 and roll-off factor (3 = 1/2, by computing their 
frequency response 

N 

X sr (a>) = Y x(.nT s )e-i mnT ‘ 

n=—N 

where T s = T /2. 

b. Plot the spectral characteristics of these three filters for IV = 10, 15, and 20 and compare 
your results with the ideal square-root raised cosine spectrum. 
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9.7 Figure P9.7 illustrates a block diagram of a QAM or PSK modulator and demodulator 
(modem) in which the modulated signals are synthesized digitally and demodulated digi- 
tally. The FIR filters have square-root raised cosine spectral characteristics and employ a 
sampling rate of 2/T, where the symbol rate 1/7’ = 2400 symbols/s. The FIR interpola- 
tors employ a sampling rate of 6/ T and are designed as linear phase FIR filters that pass 
the desired signal spectrum. 

a. Write a software program that implements the digital modulator in Figure P9.7 for the 
following parameters: roll-off factor /3 = 0.25, length of FIR shaping filter = 21, length 
of FIR interpolator =11, carrier frequency f c = 1800 Hz. 

b. Generate 5000 samples of the digital signal sequence xj(n) and compute and plot the 
power spectral density of this modulated signal. 

c. Repeat (b) for five more iterations and compute the average power spectrum over the 
total of six signal records. Comment on the results. 



(a) QAM or PSK modulator 



Sample at 
t = kT 

(b) QAM or PSK demodulator 


FIGURE P9.7 

9.8 (Carrierless QAM or PSK modem) Consider the transmission of a QAM or M - ary PSK 
(M > 4) signal at a carrier frequency f c , where the carrier is comparable to the bandwidth 
of the baseband signal. The bandpass signal may be represented as 


s(t) = Re 


H '^ {t 


nT)e j2nfc ‘ 


a. Show that s(t) can be expressed as 


^GC-nT) 


s(t) = Re 
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where Q(t) is defined as 

Q(t) = q(t) + jq(t) 
q{t) = g(t) cos 2nf c t 
q(t ) = g(t) smlit f c t 

and I' n is a phase rotated symbol, i.e., I' n = I n e> ln f cnT . 
b. Using FIR filters with responses q(t) and q (t ), sketch the block diagram of the modulator 
and demodulator implementation that does not require the mixer to translate the signal 
to bandpass at the modulator and to baseband at the demodulator. 


9.9 (Carrierless amplitude or phase [CAP] modulation) In some practical applications in 
wireline data transmission, the bandwidth of the signal to be transmitted is comparable to 
the carrier frequency. In such systems, it is possible to eliminate the step of mixing the 
baseband signal with the carrier component. Instead, the bandpass signal can be synthesized 
directly, by embedding the carrier component in the realization of the FIR shaping filters. 
Thus, the modem is realized as shown in the block diagram in Figure P9.9, where the FIR 
shaping filters have the impulse responses 

q(t) = g(t) cos 2nf c t 
q(t) = g(t) sin 2jif c t 


and g(t) is a pulse that has a square-root raised cosine spectral characteristic. 
a. Show that 


q(t)q(t)dt = 0 


and that this system can be used to transmit two-dimensional signal constellations. 



(a) Modulator 


Received 

signal 



Sample at 
t = kT 


(b) Demodulator 


FIGURE P9.9 
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b. Under what conditions is this CAP modem identical to the carrierless QAM/PSK 
modem treated in Problem 9.8. 


9.10 A band-limited signal having bandwidth W can be represented as 


OO 

X(t) = ^ X n 
n =— oo 


sin[2jrVF(t - n/2W)] 
2i rW(t - n/2W) 


a. Determine the spectrum X(f) and plot \X(f)\ for the following cases: 

xq = 2, x\ = 1, X 2 = — 1, x n = 0, n ^ 0, 1, 2 (i) 

X-\ = — 1, xo = 2, jci = — 1, x n = 0, n ^ — 1, 0, 1 (ii) 

b. Plot x(t) for these two cases. 

c. If these signals are used for binary signal transmission, determine the number of 
received levels possible at the sampling instants t = nT = n/2W and the probabilities 
of occurrence of the received levels. Assume that the binary digits at the transmitter are 
equally probable. 


9.11 A 4-kHz bandpass channel is to be used for transmission of data at a rate of 9600 bits/s. 
If i /Vo = 10“ 10 W/Hz is the spectral density of the additive zero-mean Gaussian noise in 
the channel, design a QAM modulation and determine the average power that achieves a 
bit error probability of 1 0 -6 . Use a signal pulse with a raised cosine spectrum having a 
roll-off factor of at least 50 percent. 


9.12 Determine the bit rate that can be transmitted through a 4-kHz voice-band telephone 
(bandpass) channel if the following modulation methods are used: 

a. Binary PAM. 

b. Four-phase PSK. 

c. 8-point QAM. 

d. Binary orthogonal FSK, with noncoherent detection. 

e. Orthogonal four-FSK with noncoherent detection. 

/. Orthogonal 8-FSK with noncoherent detection. 

For (a)-(c), assume that the transmitter pulse shape has a raised cosine spectrum with a 
50 percent roll-off. 


9.13 An ideal voice-band telephone line channel has a band-pass frequency-response charac- 
teristic spanning the frequency range 600-3000 Hz. 

a. Design an M = 4 PSK (quadrature PSK or QPSK) system for transmitting data at a 
rate of 2400 bits/s and a carrier frequency f c = 1800 Hz. For spectral shaping, use a 
raised cosine frequency-response characteristic. Sketch a block diagram of the system 
and describe the functional operation of each block. 

b. Repeat (a) for a bit rate R = 4800 bits/s and a 8-QAM signal. 


9.14 A voice-band telephone channel passes the frequencies in the band from 300 to 3300 Hz. 
It is desired to design a modem that transmits at a symbol rate of 2400 symbols/s, with the 
objective of achieving 9600 bits/s. Select an appropriate QAM signal constellation, carrier 
frequency, and the roll-off factor of a pulse with a raised cosine spectrum that utilizes the 
entire frequency band. Sketch the spectrum of the transmitted signal pulse and indicate the 
important frequencies. 
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9.15 A communication system for a voice-band (3 kHz) channel is designed for a received SNR 
at the detector of 30 dB when the transmitter power is P s = — 3 dBW. Determine the value 
of P s if it is desired to expand the bandwidth of the system to 10 kHz, while maintaining 
the same SNR at the detector. 

9.16 Show that a pulse having the raised-cosine spectrum given by Equation 9.2-26 satisfies 
the Nyquist criterion given by Equation 9.2-13 for any value of the roll-off factor /3. 

9.17 Show that, for any value of ) 3 , the raised cosine spectrum given by Equation 9.2-26 satisfies 


[Hint: Use the fact that X rc (f) satisfies the Nyquist criterion given by Equation 9.2-13.] 

9.18 The Nyquist criterion gives the necessary and sufficient condition for the spectrum X(f) of 
the pulse x(t) that yields zero ISI. Prove that for any pulse that is band- limited to | / 1 < 1 /T, 
the zero-ISI condition is satisfied if Re[X(/), for / > 0, consists of a rectangular function 
plus an arbitrary odd function around / = 1/27’, and Im[A(/)] is any arbitrary even 
function around / = 1/27’. 

9.19 A voice-band telephone channel has a passband characteristic in the frequency range 
300 Hz < / < 3000 Hz. 

a. Select a symbol rate and a power efficient constellation size to achieve 9600 bits/s 
signal transmission. 

b. If a square-root raised cosine pulse is used for the transmitter pulse g(t), select the 
roll-off factor. Assume that the channel has an ideal frequency-response characteristic. 

9.20 Design an M-ary PAM system that transmits digital information over an ideal channel with 
bandwidth W = 2400 Hz. The bit rate is 14,400 bits/s. Specify the number of transmitted 
points, the number of received signal points using a duobinary signal pulse, and the required 
£b to achieve an error probability of 10 -6 . The additive noise is zero-mean Gaussian with 
a power spectral density of 10~ 4 W/Hz. 

9.21 A binary PAM signal is generated by exciting a raised cosine roll-off filter with a 
50 percent roll-off factor and is then DSB/SC amplitude-modulated on a sinusoidal carrier 
as illustrated in Figure P9.21. The bit rate is 2400 bits/s. 

a. Determine the spectrum of the modulated binary PAM signal and sketch it. 

b. Draw the block diagram illustrating the optimum demodulator/detector for the received 
signal, which is equal to the transmitted signal plus additive white Gaussian noise. 




Filter with 


Carrier 

c(r) 


FIGURE P9.21 
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9.22 The elements of the sequence {(in\T=_x are independent binary random variables taking 
values of ±1 with equal probability. This data sequence is used to modulate the basic pulse 
g(t) shown in Figure P9.22a. The modulated signal is 

+oo 

X(t)= ^2 a „g{t — nT) 

n=—oo 

a. Find the power spectral density of X(t). 

b. If gi(f) (shown in Figure 9.22b) is used instead of g(t), how would the power spectrum 
in (a) change? 

c. In (b) assume we want to have a null in the spectrum at / = 1 /3 T. This is done by a 
precoding of the form b n = a n + ota n _ 3 . Find the a that provides the desired null. 

d. Is it possible to employ a precoding of the form b n = a n + ]P ;=1 or,- a B -i for some finite 
IV such that the final power spectrum will be identical to zero for 1/3 T < \ f\ < 1/27’? 
If yes, how? If no, why? [Hint: Use properties of analytic functions.] 


g(t) 


g\(t) 


FIGURE P9.22 
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9.23 Consider the transmission of data via PAM over a voice-band telephone channel that has 
a bandwidth of 3000 Hz. Show how the symbol rate varies as a function of the excess 
bandwidth. In particular, determine the symbol rate for an excess bandwidth of 25, 33, 50, 
67, 75 and 100 percent. 

9.24 The binary sequence 10010110010 is the input to a precoder whose output is used to 
modulate a duobinary transmitting filter. Construct a table as in Table 9.2-1 showing the 
precoded sequence, the transmitted amplitude levels, the received signal levels, and the 
decoded sequence. 

9.25 Repeat Problem 9.24 for a modified duobinary signal pulse. 

9.26 A precoder for a partial response signal fails to work if the desired partial response at n = 0 
is zero modulo M. For example, consider the desired response for M = 2: 

(2 (n = 0) 

/ ) 1 (« = 1 ) 

x( " r)= ]- i (» = 2) 

( 0 (otherwise) 

Show why this response cannot be precoded. 

9.27 Consider the RC low-pass filter shown in Figure P9.27, where r = RC = 10 -6 . 

a. Determine and sketch the envelope (group) delay of the filter as a function of frequency. 

b. Suppose that the input to the filter is a lowpass signal of bandwidth A / = 1 kHz. 
Determine the effect of the RC filter on this signal. 
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9.28 A microwave radio channel has a frequency response 

C(f) = 1 +0.3cos2jr/T 

Determine the frequency-response characteristic of the transmitting and receiving filters 
that yield zero ISI at a rate of 1 /T symbols/s and have a 50 percent excess bandwidth. 
Assume that the additive noise spectrum is flat. 

9.29 M = 4 PAM modulation is used for transmitting at a bit rate of 9600 bits/s on a channel 
having a frequency response 


1 + y(//2400) 

for I/I < 2400, and C(/) = 0 otherwise. The additive noise is zero-mean white Gaussian 
with power spectral density \Nq W/Hz. Determine the (magnitude) frequency-response 
characteristic of the optimum transmitting and receiving filters. 

9.30 Use the Cauchy-Schwarz inequality to show that the transmitter and receiver filters given 
by Equation 9.2-83 minimize the noise-to signal ratio a „ / d 2 , where a 2 is the noise power 
given by Equation 9.2-77, where <S„„(/) = Nq/2. 


9.31 Suppose that a channel frequency response is given as 

f 1 I/I < W/2 

C(f) = { . w 

(J — < l/l < w 

Determine the loss in SNR incurred, as given by Equations 9.2-87 and 9.2-88, for the filters 
given by the corresponding Equations 9.2-79 and 9.2-83, respectively. Which filters result 
in a smaller loss? 


9.32 In a binary PAM system, the input to the detector is 

y>n — &m + n m + im 

where a m = ±1 is the desired signal, n m is a zero-mean Gaussian random variable with 
variance er 2 , and i m represents the ISI due to channel distortion. The ISI term is a random 
variable that takes the values — f , 0, and | with probabilities and f, respectively. 
Determine the average probability of error as a function of 

9.33 In a binary PAM system, the clock that specifies the sampling of the correlator output is 
offset from the optimum sampling time by 10 percent. 

a. If the signal pulse used is rectangular, determine the loss in SNR due to the mistiming. 

b. Determine the amount of ISI introduced by the mistiming and determine its effect on 
performance. 
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9.34 The frequency-response characteristic of a lowpass channel can be approximated by 

n / 1 +acos27Tft 0 \a\<l,\f\<W 

• |^0 otherwise 

where W is the channel bandwidth. An input signal s(t) whose spectrum is band-limited 
to W Hz is passed through the channel. 

a. Show that 

1 

y(t) = s(t) + -a[s(t - t 0 ) + s(t + f 0 )] 

Thus, the channel produces a pair of echoes. 

b. Suppose that the received signal y (t ) is passed through a filter matched to s (r ). Determine 
the output of the matched filter at t = kT, k = 0, ± 1 , ±2, . . . , where T is the symbol 
duration. 

c. What is the ISI pattern resulting from the channel if to = T7 

9.35 A wireline channel of length 1000 km is used to transmit data by means of binary 
PAM. Regenerative repeaters are spaced 50 km apart along the system. Each segment 
of the channel has an ideal (constant) frequency response over the frequency band 
0 < / < 1200 Hz and an attenuation of 1 dB/km. The channel noise is AWGN. 

a. What is the highest bit rate that can be transmitted without ISI? 

b. Determine the required £b/No to achieve a bit error of /h = 10 7 for each repeater. 

c. Determine the transmitted power at each repeater to achieve the desired Si, /No, where 
N 0 = 4.1 x 1CT 21 W/Hz. 

9.36 Prove the relationship in Equation 9.3-13 for the autocorrelation of the noise at the output 
of the matched filter. 

9.37 In the case of PAM with correlated noise, the correlation metrics in the Viterbi algorithm 
may be expressed in general as (Ungerboeck, 1974) 

CM(I) = 2 

n n m 

where x n = x(nT) is the sampled signal output of the matched filter, {/„} is the data 
sequence, and {r„} is the received signal sequence at the output of the matched filter. 
Determine the metric for the duobinary signal. 

9.38 Consider the use of a (square-root) raised cosine signal pulse with a roll-off factor of unity 
for transmission of binary PAM over an ideal band-limited channel that passes the pulse 
without distortion. Thus, the transmitted signal is 

OO 

v(t)= ^ hg T (t — kT h ) 

k = — oo 

where the signal interval 7), = \ T . Thus, the symbol rate is double of that for no ISI. 

a. Determine the ISI values at the output of a matched filter demodulator. 

b. Sketch the trellis for the maximum-likelihood sequence detector and label the states. 

9.39 A binary antipodal signal is transmitted over a nonideal band-limited channel, which 
introduces ISI over two adjacent symbols. For an isolated transmitted signal pulse s(t), the 
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(noise-free) output of the demodulator is V4, at t = T, •jEblA at t = 2 T, and zero for 
t = kT, k > 2, where 4 is the signal energy and T is the signaling interval. 

a. Determine the average probability of error, assuming that the two signals are equally 
probable and the additive noise is white and Gaussian. 

b. By plotting the error probability obtained in (a) and that for the case of no ISI, determine 
the relative difference in SNR of the error probability of 10~ 6 . 

9.40 Derive the expression in Equation 9.5-5 for the coefficients in the feedback filter of the 
DFE. 

9.41 Binary PAM is used to transmit information over an unequalized linear filter channel. 
When a = 1 is transmitted, the noise-free output of the demodulator is 



0.3 

m = 1 

0.9 

m = 0 

0.3 

m = — 1 

0 

otherwise 


a. Design a three-tap zero-forcing linear equalizer so that the output is 

_ f 1 in = 0 

?m_ \0 m = ± 1 

b. Determine q„, for m = ±2, ±3, by convolving the impulse response of the equalizer 
with the channel response. 

9.42 The transmission of a signal pulse with a raised cosine spectrum through a channel results 
in the following (noise-free) sampled output from the demodulator: 


-0.5 

k = - 2 

0.1 

k = -1 

1 

k = 0 

-0.2 

k = 1 

0.05 

k = 2 

0 

otherwise 


a. Determine the tap coefficients of a three-tap linear equalizer based on the zero-forcing 
criterion. 

b. For the coefficients determined in (a), determine the output of the equalizer for the case 
of the isolated pulse. Thus, determine the residual ISI and its span in time. 

9.43 A nonideal band-limited channel introduces ISI over three successive symbols. The (noise- 
free) response of the matched filter demodulator sampled at the sampling time kT is 


s(t)s(t — kT)dt = < 


4 

0.94 

0.14 

0 


k = 0 
k = ±1 
k = ± 2 
otherwise 
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a. Determine the tap coefficients of a three-tap linear equalizer that equalizes the channel 
(received signal) response to an equivalent partial-response (duobinary) signal 


b. Suppose that the linear equalizer in (a) is followed by a Viterbi sequence detector for 
the partial signal. Give an estimate of the error probability if the additive noise is white 
and Gaussian, with power spectral density \ No W/Hz. 

9.44 Determine the tap weight coefficients of a three-tap zero-forcing equalizer if the ISI spans 
three symbols and is characterized by the values x(0) = 1, x(— 1) = 0.3, jc( 1) = 0.2. Also 
determine the residual ISI at the output of the equalizer for the optimum tap coefficients. 

9.45 In line-of-sight microwave radio transmission, the signal arrives at the receiver via two 
propagation paths: the direct path and a delayed path that occurs due to signal reflection 
from surrounding terrain. Suppose that the received signal has the form 


where s(t) is the transmitted signal, a is the attenuation (a < 1) of the secondary path, 
and n(t) is AWGN. 

a. Determine the output of the demodulator at t = T and t = 2T that employs a filter 
matched to s(t). 

b. Determine the probability of error for a symbol-by-symbol detector if the transmitted 
signal is binary antipodal and the detector ignores the ISI. 

c. What is the error rate performance of a simple (one-tap) DFE that estimates a and 
removes the ISI? Sketch the detector structure that employs a DFE. 

9.46 Repeat Problem 9.41 using the MSE as the criterion for optimizing the tap coefficients. 
Assume that the noise power spectral density is 0. 1 W/Hz. 

9.47 In a magnetic recording channel, where the readback pulse resulting from a positive tran- 
sition in the write current has the form 


a linear equalizer is used to equalize the pulse to a partial response. The parameter 7/ o is 
defined as the width of the pulse at the 50 percent amplitude level. The bit rate is 1/7), and 
the ratio of T^/T b = A is the normalized density of the recording. Suppose the pulse is 
equalized to the partial-response values 


where x (?) represents the equalized pulse shape. 

a. Determine the spectrum X(f) of the band-limited equalized pulse. 

b. Determine the possible output levels at the detector, assuming that successive transitions 
can occur at the rate 1/7/. 


yk = 


£ b k = 0, 1 

0 otherwise 


r(t) = s(t) + as(t — T) + n(t) 



x(nT) = 
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c. Determine the error rate performance of the symbol-by-symbol detector for this signal, 
assuming that the additive noise is zero-mean Gaussian with variance a 2 . 

9.48 Sketch the trellis for the Viterbi detector of the equalized signal in Problem 9.47 and 
label all the states. Also, determine the minimum Euclidean distance between merging 
paths. 

9.49 Consider the problem of equalizing the discrete-time equivalent channel shown in 
Figure P9.49. The information sequence {/„} is binary (±1) and uncorrelated. The ad- 
ditive noise { v„ } is white and real-valued, with variance No- The received sequence {y„} is 
processed by a linear three-tap equalizer that is optimized on the basis of the MSE criterion. 

a. Determine the optimum coefficients of the equalizer as a function of Nq. 

b. Determine the three eigenvalues A.i, X 2 , and 7.3 of the covariance matrix T and the 
corresponding (normalized to unit length) eigenvectors Vi, V2, V3. 

c. Determine the minimum MSE for the three-tap equalizer as a function of Nq. 

d. Determine the output SNR for the three-tap equalizer as a function of Nq. How does 
this compare with the output SNR for the infinite-tap equalizer? For example, evaluate 
the output SNR for these two equalizers when Nq = 0.1. 



FIGURE P9.49 


9.50 Use the orthogonality principle to derive the equations for the coefficients in a decision- 
feedback equalizer based on the MSE criterion and given by Equations 9.5-3 and 9.5-5. 

9.51 Suppose that the discrete-time model for the intersymbol interference is characterized by 
the tap coefficients /o , / 1 , . . . , //. . From the equations for the tap coefficients of a decision- 
feedback equalizer (DFE), show that only L taps are needed in the feedback filter of the 
DFE. That is, if { cy ) are the coefficients of the feedback filter, then Ck = 0 for k > L + 1. 

9.52 Consider the channel model shown in Figure P9.52. { v „ ) is a real-valued white noise 
sequence with zero-mean and variance Nq. Suppose the channel is to be equalized by a 
DFE having a two-tap feedforward filter (c 0 , c_i) and a one-tap feedback filter (ci). The 
{c, } are optimized using the MSE criterion. 

a. Determine the optimum coefficients and their approximate values for Nq 1 . 

b. Determine the exact value of the minimum MSE and a first-order approximation 
appropriate to the case TVo <SC 1 - 

c. Determine the exact value of the output SNR for the three-tap equalizer as a function 
of Nq and a first-order approximation appropriate to the case iVo <3C 1 • 

d. Compare the results in (b) and (c) with the performance of the infinite-tap DFE. 
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e. Evaluate and compare the exact values of the output SNR for the three-tap and infinite- 
tap DFE in the special cases where No = 0.1 and 0.01. Comment on how well the 
three-tap equalizer performs relative to the infinite-tap equalizer. 



FIGURE P9.52 


9.53 A pulse and its (raised cosine) spectral characteristic are shown in Figure P9.53. This 
pulse is used for transmitting digital information over a band-limited channel at a rate 1 /T 
symbols/s. 

a. What is the roll-off factor fil 

b. What is the pulse rate? 

c. The channel distorts the signal pulses. Suppose the sampled values of the filtered re- 
ceived pulse x{t) are as shown in Figure P9.53c. It is obvious that there are five in- 
terfering signal components. Give the sequence of +ls and — Is that will cause the 
largest (destructive or constructive) interference and the corresponding value of the 
interference (the peak distortion). 

d. What is the probability of occurrence of the worst sequence obtained in (c), assuming 
that all binary digits are equally probable and independent? 

gif) G(t ) 



0.2 

1 -T 

0 

6 

0.1 

1 r 

-IT 

0 T IT 

3 T 


FIGURE P9.53 

9.54 A time-dispersive channel having an impulse response h(t) is used to transmit four-phase 
PSK at a rate R = l/T symbols/s. The equivalent discrete-time channel is shown in 
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Figure P9.54. The sequence {%•} is a white noise sequence having zero-mean and variance 
a 2 = N 0 . 

a. What is the sampled autocorrelation function sequence {;t A } defined by 


for this channel? 

b. The minimum MSE performance of a linear equalizer and a decision-feedback equalizer 
having an infinite number of taps depends on the folded-spectrum of the channel 


where //(&>) is the Fourier transform of h{t). Determine the folded spectrum of the 
channel given above. 

c. Use your answer in (b) to express the minimum MSE of a linear equalizer in terms of 
the folded spectrum of the channel. (You may leave your answer in integral form.) 

d. Repeat (c) for an infinite-tap decision-feedback equalizer. 


9.55 Consider a four-level PAM system with possible transmitted levels, 3, 1, —1, and —3. 
The channel through which the data is transmitted introduces intersymbol interference 
over two successive symbols. The equivalent discrete-time channel model is shown in 
Figure P9.55. {%.} is a sequence of real-valued independent zero-mean Gaussian noise 
variables with variance o 2 = Nq. The received sequence is 


a. Sketch the tree structure, showing the possible signal sequences for the received signals 

y\,yi, and y 3 . 

b. Suppose the Viterbi algorithm is used to detect the information sequence. How many 
probabilities must be computed at each stage of the algorithm? 

c. How many surviving sequences are there in the Viterbi algorithm for this channel? 




FIGURE P9.54 



yi = 0.8/i + n ] 

y 2 = O.8/2 — O.6/1 + /7 2 

y3 = O.8/3 — O.6/2 + /7 3 


y k = 0.8 I k - 0.64_1 + n k 
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d. Suppose that the received signals are 

y\ = 0.5, >>2 = 2.0, >’ 3 = -1.0 

Determine the surviving sequences through stage y-$ and the corresponding metrics. 

e. Give a tight upper bound for the probability of error for four-level PAM transmitted 
over this channel. 



FIGURE P9.55 


9.56 A transversal equalizer with K taps has an impulse response 

K - 1 

e(t) = Y J c k S(t — kT) 


k=0 


where T is the delay between adjacent taps, and a transfer function 


£ (*) = E 


CkZ 


k=0 


The discrete Fourier transform (DFT) of the equalizer coefficients {c*} is defined as 

K - 1 

E n = E(z)\ z= eJ2**/K = Y c k e- J2 * kn/K , n = 0, 1, . . . , K - 1 

k=0 

The inverse DFT is defined as 
1 K ~ l 

b k = — Y E " el2nnk,K ’ * = 0,1 £-1 

77=0 

a. Show that b k = c k , by substituting for E„ in the above expression. 

b. From the relations given above, derive an equivalent filter structure having the z 
transform 


E(z) = 



Ei(z) 


K - 1 


E 


i 


gjlitn/K z ~ 1 

Eiiz) 


c. If E(z) is considered as two separate filters E\(z) and £ 2 ( 2 ) in cascade, sketch a block 
diagram for each of the filters, using z~* to denote a unit of delay. 

d. In the transversal equalizer, the adjustable parameters are the equalizer coefficients {c k } . 
What are the adjustable parameters of the equivalent equalizer in (b), and how are they 
related to {c<.}? 



Adaptive Equalization 


In Chapter 9, we introduced both optimum and suboptimum receivers that compen- 
sate for ISI in the transmission of digital information through band-limited, nonideal 
channels. The optimum receiver employed maximum-likelihood sequence estimation 
for detecting the information sequence from the samples of the demodulation filter. 
The suboptimum receivers employed either a linear equalizer or a decision-feedback 
equalizer. 

In the development of the three equalization methods, we implicitly assumed that 
the channel characteristics, either the impulse response or the frequency response, 
were known at the receiver. However, in most communication systems that employ 
equalizers, the channel characteristics are unknown a priori and, in many cases, the 
channel response is time-variant. In such a case, the equalizers are designed to be 
adjustable to the channel response and, for time- variant channels, to be adaptive to the 
time variations in the channel response. 

In this chapter, we present algorithms for automatically adjusting the equalizer co- 
efficients to optimize a specified performance index and to adaptively compensate for 
time variations in the channel characteristics. We also analyze the performance charac- 
teristics of the algorithms, including their rate of convergence and their computational 
complexity. 


■ 10.1 

ADAPTIVE LINEAR EQUALIZER 

In the case of the linear equalizer, recall that we considered two different criteria 
for determining the values of the equalizer coefficients {cj.}. One criterion was based 
on the minimization of the peak distortion at the output of the equalizer, which is 
defined by Equation 9.4-22. The other criterion was based on the minimization of the 
mean square error at the output of the equalizer, which is defined by Equation 9.4-42. 
Below, we describe two algorithms for performing the optimization automatically and 
adaptively. 
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10.1- 1 The Zero-Forcing Algorithm 

In the peak-distortion criterion, the peak distortion T>( c), given by Equation 9.4-22, is 
minimized by selecting the equalizer coefficients { c k } . In general, there is no simple 
computational algorithm for performing this optimization, except in the special case 
where the peak distortion at the input to the equalizer, defined as 29 q in Equation 9.4-23, 
is less than unity. When 29 o < 1, the distortion 22(c) at the output of the equalizer is 
minimized by forcing the equalizer response q n = 0, for 1 < \n\ < K, and q 0 = 1. In 
this case, there is a simple computational algorithm, called the zero-forcing algorithm, 
that achieves these conditions. 

The zero-forcing solution is achieved by forcing the cross correlation between the 
error sequence e k = h — Ik and the desired information sequence {4} to be zero 
for shifts in the range 0 < \n\ < K. The demonstration that this leads to the desired 
solution is quite simple. We have 

= E(hi;_ j )-E(hi;_ j ), j — —K k 

We assume that the information symbols are uncorrelated, i.e., E (hi*) = 8kj, and that 
the information sequence {4} is uncorrelated with the additive noise sequence {r] k }- 
For h, we use the expression given in Equation 9.4^4- 1 . Then, after taking the expected 
values in Equation 10.1-1, we obtain 

E(e k I*_j) =8 j0 - qj , j = -K,...,K (10.1-2) 

Therefore, the conditions 

E (e k Ik_j) = 0, j = -K,...,K (10.1-3) 

are fulfilled when qo = 1 and q n = 0, 1 < \n\ < K. 

When the channel response is unknown, the cross correlations given by Equa- 
tion 10.1-1 are also unknown. This difficulty can be circumvented by transmitting a 
known training sequence {4} to the receiver, which can be used to estimate the cross 
correlation by substituting time averages for the ensemble averages given in Equation 

10.1- 1. After the initial training, which will require the transmission of a training se- 
quence of some predetermined length that equals or exceeds the equalizer length, the 
equalizer coefficients that satisfy Equation 10.1-3 can be determined. 

A simple recursive algorithm for adjusting the equalizer coefficients is 

c f+ l) = c f) + Ask I*_. t j = 1,0,1 K (10.1-4) 

where c- 1 is the value of the jth coefficient at time t = kT , s k = 4 — 4t is the error 
signal at time t = kT, and A is a scale factor that controls the rate of adjustment, as will 
be explained later in this section. This is the zero-forcing algorithm. The term fy. 4* ■ 
is an estimate of the cross correlation (ensemble average) E (sk 4*_ ; ) • The averaging 
operation of the cross correlation is accomplished by means of the recursive first-order 
difference equation algorithm in Equation 10.1-4, which represents a simple discrete- 
time integrator. 
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FIGURE 10.1-1 

An adaptive zero-forcing equalizer. 


Following the training period, after which the equalizer coefficients have converged 
to their optimum values, the decisions at the output of the detector are generally suffi- 
ciently reliable so that they may be used to continue the coefficient adaptation process. 
This is called a decision-directed mode of adaptation. In such a case, the cross cor- 
relations in Equation 10.1-4 involve the error signal e k = I k — l k and the detected 
output sequence h-j, j = —K, . . . , K. Thus, in the adaptive mode, Equation 1 0. 1-4 
becomes 


cf +1) = cf + Ae k I* k _j (10.1-5) 

Figure 10. 1-1 illustrates the zero-forcing equalizer in the training mode and the adaptive 
mode of operation. 

The characteristics of the zero-forcing algorithm are similar to those of the least- 
mean-square (LMS) algorithm, which minimizes the MSE and which is described in 
detail in the following section. 


10.1-2 The LMS Algorithm 

In the minimization of the MSE. treated in Section 9.4-2, we found that the optimum 
equalizer coefficients are determined from the solution of the set of linear equations, 
expressed in matrix form as 


rc = § 


( 10 . 1 - 6 ) 
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where r is the (2 K + 1) x (IK + 1) covariance matrix of the signal samples { v k } . C is 
the column vector of (2 K + 1) equalizer coefficients, and | is a (2 K + l)-dimensional 
column vector of channel filter coefficients. The solution for the optimum equalizer 
coefficients vector C opt can be determined by inverting the covariance matrix r. which 
can be efficiently performed by use of the Levinson-Durbin algorithm (see Levinson 
(1947) and Durbin (1959)). 

Alternatively, an iterative procedure that avoids the direct matrix inversion may 
be used to compute C opt . Probably the simplest iterative procedure is the method of 
steepest descent, in which one begins by arbitrarily choosing the vector C, say as Co- 
ntis initial choice of coefficients corresponds to some point on the quadratic MSE 
surface in the (2 K + 1 )-dimensional space of coefficients. The gradient vector Go, 
having the 2 K + 1 gradient components \dJ/dcok, k = —K, ...,—1,0, 1, . . . , K, is 
then computed at this point on the MSE surface, and each tap weight is changed in 
the direction opposite to its corresponding gradient component. The change in the yth 
tap weight is proportional to the size of the jth gradient component. Thus, succeeding 
values of the coefficient vector C are obtained according to the relation 

C k+ \ = C k — AG k , k = 0,1,2,... (10.1-7) 

where the gradient vector G k is 

G k = \^r = rC k -^ = -E(s k V* k ) ( 10 . 1 - 8 ) 

L ClL k 

The vector C, << represents the set of coefficients at the kth iteration, s k = h ~ Ik is 
the error signal at the k\h iteration, V k is the vector of received signal samples that 
make up the estimate I k , i.e., V k = [v k +K ■ • • v k ■ ■ ■ v k -K 1 11 ■ and A is a positive number 
chosen small enough to ensure convergence of the iterative procedure. If the minimum 
MSE is reached for some k = k 0 , then G k = 0 , so that no further change occurs in 
the tap weights. In general, 7 m i n (A') cannot be attained for a finite value of /c 0 with the 
steepest-descent method. It can, however, be approached as closely as desired for some 
finite value of ko. 

The basic difficulty with the method of steepest descent for determining the opti- 
mum tap weights is the lack of knowledge of the gradient vector G k , which depends 
on both the covariance matrix T and the vector £ of cross correlations. In turn, these 
quantities depend on the coefficients {f k } of the equivalent discrete-time channel model 
and on the covariance of the information sequence and the additive noise, all of which 
may be unknown at the receiver in general. To overcome the difficulty, estimates of 
the gradient vector may be used. That is, the algorithm for adjusting the tap weight 
coefficients may be expressed in the form 

C k+l = C k -AG k (10.1-9) 

where G k denotes an estimate of the gradient vector G k and C k denotes the estimate 
of the vector of coefficients. 

From Equation 10.1-8 we note that G k is the negative of the expected value of the 
s k V* k . Consequently, an estimate of G k is 


G k = -e k V* k 


( 10 . 1 - 10 ) 
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Input { v t } 




FIGURE 10.1-2 

Linear adaptive equalizer based on the MSE criterion. 


Since E(G k ) = G, k , the estimate G k is an unbiased estimate of the true gradient vector 
G k . Incorporation of Equation 10.1-10 into Equation 10.1-9 yields the algorithm 

C k+1 = C k + Ae,V* (10.1-11) 

This is the basic LMS algorithm for recursively adjusting the tap weight coefficients of 
the equalizer as described by Widrow (1966). It is illustrated in the equalizer shown in 
Figure 10.1-2. 

The basic algorithm given by Equation 10. 1-1 1 and some of its possible variations 
have been incorporated into many commercial adaptive equalizers that are used in high- 
speed modems. Three variations of the basic algorithm are obtained by using only sign 
information contained in the error signal s k and/or in the components of V k . Hence, 
the three possible variations are 

C( k+ i)j = c kj + Acsgn(s k )vl_j, j = -K, ...,-1,0,1,..., K (10.1-12) 

c (k+1)j = c kj + Ae k csgn(v* k _j), j = -K, . . . , -1, 0, 1, . . . , K (10.1-13) 

Gk+i)j = c kj + Acsgn(£*)csgn(i> %_j), j = -K , . . . , -1,0, 1 K (10.1-14) 

where csgn(x) is defined as 

{ 1 + j [Re(x) > 0, Im(x) > 0] 

1 — j [Re(x) > 0, Im(x) < 0] 

— 1 + 7 [Re(x) < 0, Im(x) > 0] 

— 1 — j [Re(x) < 0, Im(x) < 0] 


csgn(x) = 


(10.1-15) 
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(Note that in Equation 10.1-15, j = — I , as distinct from the index j in Equa- 

tions 10.1-12 to 10.1-14.) Clearly, the algorithm in Equation 10.1-14 is the most 
easily implemented, but it gives the slowest rate of convergence relative to the others. 

Several other variations of the LMS algorithm are obtained by averaging or filtering 
the gradient vectors over several iterations prior to making adjustments of the equalizer 
coefficients. For example, the average over N gradient vectors is 



N - 1 

'y SmN+nV mN+n 
u=0 


(10.1-16) 


and the corresponding recursive equation for updating the equalizer coefficients once 
every N iterations is 

C(k+\)N = CkN ~ AGkN (10.1-17) 


In effect, the averaging operation performed in Equation 10.1-16 reduces the noise in 
the estimate of the gradient vector, as shown by Gardner (1984). 

An alternative approach is to filter the noisy gradient vectors by a low-pass filter 
and use the output of the filter as an estimate of the gradient vector. For example, a 
simple low-pass filter for the noisy gradients yields as an output 

G k = wG k -i + (1 - w)G k , G(0) = G(0) (10.1-18) 


where the choice of 0 < w < 1 determines the bandwidth of the low-pass filter. When 
w is close to unity, the filter bandwidth is small and the effective averaging is performed 
over many gradient vectors. On the other hand, when w is small, the low-pass filter has 
a large bandwidth and, hence, it provides little averaging of the gradient vectors. With 
the filtered gradient vectors given by Equation 10.1-18 in place of Gk, we obtain the 
filtered gradient LMS algorithm given by 

Ck+i = C k -A&k (10.1-19) 

In the above discussion, it has been assumed that the receiver has knowledge of 
the transmitted information sequence in forming the error signal between the desired 
symbol and its estimate. Such knowledge can be made available during a short training 
period in which a signal with a known information sequence is transmitted to the 
receiver for initially adjusting the tap weights. The length of this sequence must be at 
least as large as the length of the equalizer so that the spectrum of the transmitted signal 
adequately covers the bandwidth of the channel being equalized. 

In practice, the training sequence is often selected to be a periodic pseudorandom 
sequence, such as a maximum length shift-register sequence whose period N is equal to 
the length of the equalizer ( N = 2 K + 1). In this case, the gradient is usually averaged 
over the length of the sequence as indicated in Equation 10.1-16 and the equalizer 
is adjusted once a period according to Equation 10.1-17. This approach has been 
called cyclic equalization , and has been treated in the papers by Mueller and Spaulding 
(1975) and Qureshi (1977, 1985). A practical scheme for continuous adjustment of the 
tap weights may be either a decision-directed mode of operation in which decisions on 
the information symbols are assumed to be correct and used in place of 4 in forming 
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the error signal Ek, or one in which a known pseudorandom-probe sequence is inserted 
in the information-bearing signal either additively or by interleaving in time and the tap 
weights adjusted by comparing the received probe symbols with the known transmitted 
probe symbols. In the decision-directed mode of operation, the error signal becomes 
Ek = h — h, where /& is the decision of the receiver based on the estimate //.. As long 
as the receiver is operating at low error rates, an occasional error will have a negligible 
effect on the convergence of the algorithm. 

If the channel response changes, this change is reflected in the coefficients {/).} 
of the equivalent discrete-time channel model. It is also reflected in the error signal 
Ek, since it depends on {ft}- Hence, the tap weights will be changed according to 
Equation 10.1-11 to reflect the change in the channel. A similar change in the tap 
weights occurs if the statistics of the noise or the information sequence change. Thus, 
the equalizer is adaptive. 


10.1-3 Convergence Properties of the LMS Algorithm 


The convergence properties of the LMS algorithm given by Equation 10.1-1 1 are gov- 
erned by the step-size parameter A. We shall now consider the choice of the parameter 
A to ensure convergence of the steepest-descent algorithm in Equation 10.1-7, which 
employs the exact value of the gradient. 

From Equations 10.1-7 and 10.1-8, we have 


Ck+ i = Ck — &Gk 

= (I ~ A r)C k + A£ 


( 10 . 1 - 20 ) 


where I is the identity matrix, E is the autocorrelation matrix of the received signal, 
Ck is the (2 K + 1) -dimensional vector of equalizer tap gains, and £ is the vector of 
cross correlations given by Equation 9.4-45. The recursive relation in Equation 10. 1-20 
can be represented as a closed-loop control system as shown in Figure 10.1-3. Unfor- 
tunately, the set of 2 K + 1 first-order difference equations in Equation 10.1-20 are 
coupled through the autocorrelation matrix E In order to solve these equations and, 
thus, establish the convergence properties of the recursive algorithm, it is mathemati- 
cally convenient to decouple the equations by performing a linear transformation. The 
appropriate transformation is obtained by noting that the matrix E is Hermitian and, 
hence, can be represented as 

r = UAU H (10.1-21) 



FIGURE 10.1-3 

Closed-loop control system representation of the 
recursive relation in Equation 10.1-20. 
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where V is the normalized modal matrix of E and A is a diagonal matrix with diagonal 
elements equal to the eigenvalues of r (see Appendix A). 

When Equation 10.1-21 is substituted into Equation 10.1-20 and if we define the 
transformed (orthogonalized) vectors C° k = U H C A and = U H §, we obtain 

C° k+l = (I - AA)C° + A$° (10.1-22) 

This set of first-order difference equations is now decoupled. Their convergence is 
determined from the homogeneous equation 

C° k+l = (/ - AA)C° k (10.1-23) 

We see that the recursive relation will converge provided that all the poles lie inside the 
unit circle, i.e., 


|1-AA. a |<1, k = -K, ...,-1,0,1,...,* (10.1-24) 


where j /./. } is the set of 2 K + 1 (possibly nondistinct) eigenvalues of E Since E is an 
autocorrelation matrix, it is positive-definite and, hence, a a > 0 for all k. Consequently 
convergence of the recursive relation in Equation 10.1-22 is ensured if A satisfies the 
inequality 

2 

0 < A < (10.1-25) 

4 max 

where /. max is the largest eigenvalue of E 

Since the largest eigenvalue of a positive-definite matrix is less than the sum of all 
the eigenvalues of the matrix and, furthermore, since the sum of the eigenvalues of a 
matrix is equal to its trace, we have the following simple upper bound on /, max : 

K 

kmax < y-'' A = tr E= (2 K + 1)T« 

(10.1-26) 


= (2 K + 1)(jc 0 + No) 


From Equations 10.1-23 and 10.1-24 we observe that rapid convergence occurs 
when 1 1 — AA a | is small, i.e., when the pole positions are far from the unit circle. But 
we cannot achieve this desirable condition and still satisfy Equation 10.1-25 if there 
is a large difference between the largest and smallest eigenvalues of E In other words, 
even if we select A to be near the upper bound given in Equation 10.1-25, the con- 
vergence rate of the recursive MSE algorithm is determined by the smallest eigenvalue 
/. mm . Consequently, the ratio >, niax ultimately determines the convergence rate. If 
'-max Amm is small, A can be selected so as to achieve rapid convergence. However, if 
the ratio A. max /A. m i n is large, as is the case when the channel frequency response has 
deep spectral nulls, the convergence rate of the algorithm will be slow. 


10.1-4 Excess MSE due to Noisy Gradient Estimates 

The recursive algorithm in Equation 10.1-1 1 for adjusting the coefficients of the linear 
equalizer employs unbiased noisy estimates of the gradient vector. The noise in these 
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estimates causes random fluctuations in the coefficients about their optimal values and, 
thus, leads to an increase in the MSE at the output of the equalizer. That is, the final 
MSE is y m j n + 7 A , where 7 A is the variance of the measurement noise. The term 7 A due 
to the estimation noise has been termed excess mean square error by Widrow (1966). 

The total MSE at the output of the equalizer for any set of coefficients C can be 
expressed as 

j = j Mn + (C- C opl ) H r(C - Copt) (10.1-27) 

where C op t represents the optimum coefficients, which satisfy Equation 10.1-6. This 
expression for the MSE can be simplified by performing the linear orthogonal transfor- 
mation used above to establish convergence. The result of this transformation applied 
to Equation 10.1-27 is 


K 

J = Jmin + k k E \ 4 - q° op tl 2 (10.1-28) 

k=—K 


where the ( c ". } are the set of transformed equalizer coefficients. The excess MSE is the 
expected value of the second term in Equation 10.1-28, i.e., 

K 


Ja= E ^ E K-c° kopt I 2 

k=-K 

It has been shown by Widrow (1970) that the excess MSE is 

A = J; 


(10.1-29) 


(10.1-30) 


The expression in Equation 10. 1-30 can be simplified when A is selected such that 
Ak , t <3C 1 for all k. Then 


7a ^ 2 ^ ^ mm ^ ^ ^-k 


\aj. 


k=~K 

min tf T 


(10.1-31) 


« \ A(2K + l)7 min (x 0 + A^ 0 ) 

Note that xq + No represents the received signal plus noise power. 

It is desirable to have 7 A < 7 m i n . That is, A should be selected such that 


4^- « kM2K + l)(xo + ^Vo) < 1 
7min 

or, equivalently, 

2 

A < 


(2 K + l)(x 0 + No) 


(10.1-32) 
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For example, if A is selected as 

0.2 

A ~ (2 K + l)(x 0 + N 0 ) 


(10.1-33) 


the degradation in the output SNR of the equalizer due to the excess MSE is less than 
1 dB. 

The analysis given above on the excess mean square error is based on the assumption 
that the mean value of the equalizer coefficients has converged to the optimum value 
C opt . Under this condition, the step size A should satisfy the bound in Equation 10.1- 
32. On the other hand, we have determined that convergence of the mean coefficient 
vector requires that A < 2/A. max . While a choice of A near the upper bound 2// max 
may lead to initial convergence of the deterministic (known) steepest-descent gradient 
algorithm, such a large value of A will usually result in instability of the LMS stochastic 
gradient algorithm. 

The initial convergence or transient behavior of the LMS algorithm has been in- 
vestigated by several researchers. Their results clearly indicate that the step size must 
be reduced in direct proportion to the length of the equalizer as specified by Equa- 
tion 10.1-32. Hence, the upper bound given by Equation 10.1-32 is also necessary 
to ensure the initial convergence of the LMS algorithm. The papers by Gitlin and 
Weinstein (1979) and Ungerboeck (1972) contain analyses of the transient behavior 
and the convergence properties of the LMS algorithm. 

The following example serves to reinforce the important points made above re- 
garding the initial convergence of the LMS algorithm. 

example io.t- 1 . The LMS algorithm was used to adaptively equalize a communi- 
cation channel for which the autocorrelation matrix F has an eigenvalue spread of 
7-max Amin = 11. The number of taps selected for the equalizer was 2 K + 1 = 11. The 
input signal plus noise power xo + No was normalized to unity. Hence, the upper bound 
on A given by Equation 10. 1—32 is 0. 18. Figure 10. 1-4 illustrates the initial convergence 
characteristics of the LMS algorithm for A = 0.045, 0.09, and 0. 1 15, by averaging the 
(estimated) MSE in 200 simulations. We observe that by selecting A = 0.09 (one-half 
of the upper bound) we obtain relatively fast initial convergence. If we divide A by a 
factor of 2 to A = 0.045, the convergence rate is reduced but the excess mean square 
error is also reduced, so that the LMS algorithm performs better in steady state (in a 
time-invariant signal environment). Finally, we note that a choice of A = 0. 1 15, which 



FIGURE 10.1-4 

Initial convergence characteristics of the LMS 
algorithm with different step sizes. ( From Digital 
Signal Processing, by J. G. Proakis and D. G. 
Manolakis, 1995, Prentice Hall Company. Reprinted 
with permission of the publisher.) 
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is still far below the upper bound, causes large undesirable fluctuations in the output 

MSE of the algorithm. 

In a digital implementation of the LMS algorithm, the choice of the step-size 
parameter becomes even more critical. In an attempt to reduce the excess mean square 
error, it is possible to reduce the step-size parameter to the point where the total mean 
square error actually increases. This condition occurs when the estimated gradient 
components of the vector SkV* k after multiplication by the small step-size parameter 
A are smaller than one-half of the least significant bit in the fixed-point representation 
of the equalizer coefficients. In such a case, adaptation ceases. Consequently, it is 
important for the step size to be large enough to bring the equalizer coefficients in the 
vicinity of C op t- If it is desired to decrease the step size significantly, it is necessary 
to increase the precision in the equalizer coefficients. Typically, 16 bits of precision 
may be used for the coefficients, with about 10-12 of the most significant bits used for 
arithmetic operations in the equalization of the data. The remaining least significant 
bits are required to provide the necessary precision for the adaptation process. Thus, the 
scaled estimated gradient components A eV* k usually affect only the least-significant 
bits in any one iteration. In effect, the added precision also allows for the noise to be 
averaged out, since many incremental changes in the least-significant bits are required 
before any change occurs in the upper more significant bits used in arithmetic operations 
for equalizing the data. For an analysis of roundoff errors in a digital implementation of 
the LMS algorithm, the reader is referred to the papers by Gitlin and Weinstein (1979), 
Gitlin et al. (1982), and Caraiscos and Liu (1984). 

As a final point, we should indicate that the LMS algorithm is appropriate for 
tracking slowly time invariant signal statistics. In such a case, the minimum MSE and 
the optimum coefficient vector will be time -variant. In other words, 7 m i n («) is a function 
of time and the 2 (K + 1) -dimensional error surface is moving with the time index n. 
The LMS algorithm attempts to follow the moving minimum J m m(n) in the (2K + 1)- 
dimensional space, but it is always lagging behind due to its use of (estimated) gradient 
vectors. As a consequence, the LMS algorithm incurs another form of error, called the 
lag error, whose mean square value decreases with an increase in the step size A. The 
total MSE error can now be expressed as 

7 total = Jmw(n) + 7a + 7/ (10.1-34) 

where 7 ) denotes the mean square error due to the lag. 

In any given nonstationary adaptive equalization problem, if we plot the errors 7 a 
and 7/ as a function of A, we expect these errors to behave as illustrated in Figure 10. 1-5. 
We observe that 7 a increases with an increase in A while 7/ decreases with an increase 
in A. The total error will exhibit a minimum, which will determine the optimum choice 
of the step-size parameter. 

When the statistical time variations of the signal occur rapidly, the lag error will 
dominate the performance of the adaptive equalizer. In such a case, 7/ 7 mm + 7 a, 
even when the largest possible value of A is used. When this condition occurs, the 
LMS algorithm is inappropriate for the application and one must rely on the more 
complex recursive least-squares algorithms described in Section 10.4 to obtain faster 
convergence. 
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FIGURE 10.1-5 



J A error due to 
noisy gradients 


( From Digital Signal Processing, by J. G. 
Proakis and D. G. Manolakis, 1995, 
Prentice Hall Company. Reprinted with 
permission of the publisher.) 
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10.1-5 Accelerating the Initial Convergence Rate in the LMS Algorithm 

As we have observed, the initial convergence rate of the LMS algorithm for any given 
channel characteristic is controlled by the step-size parameter A. The initial conver- 
gence rate is strongly influenced by the channel spectral characteristics, which are 
related to the eigenvalues { ),,, ) of the received signal covariance matrix. If the channel 
amplitude and phase distortions are small, the eigenvalue ratio A ma x Amin is close to 
unity and, hence, the equalizer converges to its optimum tap coefficients relatively fast. 
On the other hand, if the channel exhibits poor spectral characteristics, such as rela- 
tively large attenuation in a part of its spectrum, the eigenvalue ratio 7. ma x Amin 1 
and, hence, the convergence rate of the LMS algorithm will be slow. 

A considerable effort has been spent by researchers on methods to accelerate the 
initial convergence of the LMS algorithm. A simple remedy is to begin with a large step 
size, say Ao, and reduce the step size as the tap coefficients converge to their optimum 
values. In other words, we use a sequence of step sizes, Ao > A] > A 2 > • • • 
> A,„ = A, where A is the final step size to be used in steady-state operation of the 
LMS algorithm. 

An alternative method for accelerating initial convergence has been proposed and 
investigated by Chang (1971) and Qureshi (1977). This method is based on introducing 
additional parameters in the LMS algorithm by replacing the step size with a weighting 
matrix W. In such a case, the LMS algorithm is generalized to the form: 


where W is the weighting matrix. Ideally, W = F A or if Tis estimated, then W can 
be set equal to the inverse of the estimate. 

When the training sequence for the equalizer is periodic with period N, the co- 
variance matrix T is Toeplitz and circulant and its inverse is circulant. In this case, 
the multiplication by the weighting matrix W can be simplified considerably by the 
implementation of a single finite duration impulse response (FIR) filter with weights 
equal to the first row of W, as indicated by Qureshi (1977). That is, the fast update 
algorithm that is equivalent to multiplying the gradient vector G k by W is simply im- 
plemented as shown in Figure 10.1-6, by inserting the FIR filter with N coefficients 


Ck + 1 = C k — WGk 


= Ck + w(rc - $) 

= C k + We k V* k 


(10.1-35) 
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wo, w i, . . . , Wjv-i in the path of the periodic input sequence before it is used for tap 
coefficient adjustment. 

Qureshi (1977) described a method for estimating the weights from the received 
signal. The basic steps are as follows: 

1. Collect one period ( N symbols) of received data vq, V\, . . . , r>,v- i in the equalizer 
delay line. 

2. Compute the N -point discrete Fourier transform (DFT) of { v n } denoted as {R,,}. 

3. Compute the discrete power spectrum \R,,\ 2 . If we neglect the noise, \R„\ 2 corre- 
sponds to N times the eigenvalues of the circulant covariance matrix of the signal 
at the input to the equalizer. Then, add N times the estimate of the noise variance 
a 2 to \R„\ 2 . 

4. Compute the inverse DFT of the sequence 1/(|7?„| 2 + Na 2 ), n =0, 1, . . . , N — 1. 
This yields the sequence j w „ } of filter coefficients for the filter shown in 
Figure 10.1-6. 

5. The algorithm for adjusting the equalizer tap coefficient now becomes 


N - 1 

c ( j k+r> = cf - ej ]T w k v* k _j_ m , j = 0, 1, . . . , N - 1 (10. 1-36) 

m = 0 



FIGURE 10.1-6 

Fast start-up technique for an adaptive equalizer. 
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10.1-6 Adaptive Fractionally Spaced Equalizer — The Tap 
Leakage Algorithm 

As described in Section 9.4-4, an FSE is preferable to a symbol rate equalizer (SRE) 
when the channel characteristics are unknown at the receiver. In such a case, the FSE 
combines the operations of matched filtering and equalization of intersymbol interfer- 
ence into a single filter. By processing samples at the Nyquist rate, the FSE adapts its 
coefficients to compensate for any timing phase within a symbol. Thus, its performance 
is insensitive to the sampling time within a symbol interval, as discussed previously. 
Consequently, from a performance viewpoint, the FSE is equivalent to a matched filter 
followed by a symbol rate sampler, and followed by an SRE. 

The LMS algorithm and any of its variants can be used to adjust the coefficients of 
the FSE adaptively. Suitable training signals for initial adjustment may take the form of 
an aperiodic pseudorandom sequence or a periodic pseudorandom sequence, where the 
period is equal to the time span of the equalizer, i.e., a sequence of period P is used to 
train an FSE with PN/M coefficients, where the tap spacing is MT /N. In the case of a 
periodic sequence for training, the update of each of the coefficients may be performed 
periodically, once in every period of the sequence based on the average gradient LMS 
algorithm given by Equations 10.1-16 and 10.1-17. 

In a digital implementation of the LMS algorithm for an FSE, some care must 
be exercised in selecting the step-size parameter A. It has been shown by Gitlin and 
Weinstein (1981) and further described by Qureshi (1985) that in an FSE, a fraction 
(N — M)/N of the eigenvalues of the received signal covariance matrix are very small. 
These small eigenvalues and their corresponding eigenvectors are related to the spectral 
characteristics of the noise in the frequency band (1 + P)/2T < \f\ < \/T. As 
a consequence, the output MSE becomes insensitive to deviations in the coefficient 
values corresponding to these eigenvalues. In such cases, errors due to finite precision 
arithmetic accumulate along the eigenvectors (frequency band) corresponding to the 
small eigenvalues and eventually cause overflows in the coefficient values, without 
significantly affecting the overall MSE. 

A solution to this problem has been given in the paper by Gitlin et al. (1982). Instead 
of minimizing the MSE given by Equation 9.4-42, we minimize the performance index 

K 

7 = / MSE + /x ^ |c, | 2 (10.1-37) 

i =-K 

where 7 mse is the conventional MSE and g is a small positive constant. Thus, the 
ill-conditioning of the received signal covariance matrix is avoided. The minimization 
of J leads to the following “modified LMS” algorithm (see Problem 10.5). 

C k+ i = (1 - A n)C k + A s k V* k (10.1-38) 

This algorithm is called the tap-leakage algorithm. 

In adapting the tap coefficients of an FSE, the tap adjustments, as described above, 
are made periodically either at the symbol rate or slower when a periodic training 
sequence is transmitted. However, the samples at the input to the FSE occur at a faster 
rate. For example, if we consider a T /2 FSE. there are two samples per information 
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symbol. An interesting question is whether or not it is possible to increase the initial 
convergence rate of an FSE by adapting its coefficients at the sampling rate. If the tap 
adjustments are performed at the sampling rate, one must generate additional desired 
signal values corresponding to sample values that fall between values of the desired 
symbols. That is, one must design a filter that performs intersymbol interpolation in 
order to generate the intermediate desired sample sequence. This problem has been 
considered by Gitlin and Weinstein (1981), Cioffi and Kailath (1984), and Ling (1989). 
The results given in the paper by Ling provide an answer to the question. 

Lirst we note that the initial convergence of the LMS algorithm depends on the 
number of nontrivial eigenvalues of the autocorrelation matrix of the received signal. 
This number is equal to the number of independent parameters that are to be optimized. 
Lor example, an SRE that has K taps and spans a time interval of KT seconds has K 
independent parameters to be optimized. In contrast, a T /2 complex-valued LSE that 
spans the same time interval has 2 K tap coefficients, but its autocorrelation matrix has 
K nontrivial (and K trivial) eigenvalues and, thus, it has K independent parameters 
to be optimized. Consequently, the complex-valued T /2 LSE that is adapted at the 
symbol rate has the same convergence rate as the SRE. Now, if the complex-valued FSE 
employs interpolation to update its coefficients at all time instants nT /2, the number of 
independent parameters to be optimized is 2 K . In this case, there are two autocorrelation 
matrices, one corresponding to samples at n T /2, and the other corresponding to samples 
at (n T + l)/2, and each matrix has K nontrivial eigenvalues. That is, the T /2 FSE that 
employs interpolation adjusts one set of K parameters in one update and the second set 
of K parameters in the next update. Therefore, the convergence rate of the interpolated 
FSE will be approximately the same as the convergence rate of the symbol-updated FSE. 

In the case of a phase-splitting FSE (PS-FSE), which is implemented at bandpass, 
with a time span of KT seconds and tap spacing T/N, where N > 2, e.g., N = 3 
or 4, there are K N parameters to be optimized. In this case, Ling (1989) showed that 
the convergence rate of the PS-FSE was approximately a factor of 2 slower than the 
convergence rate of the conventional complex-valued FSE, when the PS-FSE is adjusted 
at the symbol rate. By employing ideal intersymbol interpolation, the convergence rate 
of the PS-FSE is increased by approximately a factor of 2 compared to symbol rate 
adjustment of the PS-FSE. Thus, the PS-FSE with intersymbol interpolation achieves 
the same convergence rate as the conventional complex- valued FSE that is adjusted at 
the symbol rate. 


10.1-7 An Adaptive Channel Estimator for ML Sequence Detection 

The ML sequence detection criterion implemented via the Viterbi algorithm as em- 
bodied in the metric computation given by Equation 9.3-23 requires knowledge of the 
equivalent discrete-time channel coefficients {//.}. To accommodate a channel that is 
unknown or slowly time varying, one may include a channel estimator connected in 
parallel with the detection algorithm, as shown in Figure 10.1-7. The channel estima- 
tor, which is shown in Figure 10.1-8, is identical in structure to the linear transver- 
sal equalizer discussed previously in Section 10.1. In fact, the channel estimator is 
a replica of the equivalent discrete-time channel filter that models the intersymbol 


704 


Digital Communications 


Input - 


{*'*■} 


Vitcrbi 

algorithm 


• Output 


Channel estimate 


Channel 

estimator 


FIGURE 10.1-7 

Block diagram of method for estimating the channel 
characteristics for the Viterbi algorithm. 


interference. The estimated tap coefficients, denoted by {ft}, are adjusted recursively 
to minimize the MSE between the actual received sequence and the output of the esti- 
mator. For example, the LMS steepest-descent algorithm in a decision-directed mode of 
operation is 

h+i =fk + AeJ* (10.1-39) 


where f k is the vector of tap gain coefficients at the Ath iteration, A is the step size, 
s k = Vk — i’k is the error signal, and I k denotes the vector of detected information 
symbols in the channel estimator at the Ath iteration. 

We now show that when the MSE between Vk and Vk is minimized, the result- 
ing values of the tap gain coefficients of the channel estimator are the values of the 
discrete-time channel model. For mathematical tractability, we assume that the detected 
information sequence {/*} is correct, i.e., {l k } is identical to the transmitted sequence 
{Ik}- This is a reasonable assumption when the system is operating at a low probability 
of error. Thus, the MSE between the received signal Vk and the estimate 0 k is 


/ 

J(f) = E 

V 


Vk ~ 


N - 1 


E hh-i 

j = 0 



(10.1-40) 



FIGURE 10.1-8 

Adaptive transversal filter for estimating the channel dispersion. 


Chapter Ten: Adaptive Equalization 


705 


The tap coefficients \f k \ that minimize J(f) in Equation 1 0. 1 — 40 satisfy the set of N 
linear equations 

N - 1 

J2fjRkj=d k , k = 0, l, . . . , N — 1 (10.1-41) 


Rkj = E(l k I*), d k = Y, fj R kj (10.1-42) 

7=0 

From Equations 10.1-41 and 10.1-42, we conclude that, as long as the information 
sequence {I k } is uncorrelated, the optimum coefficients are exactly equal to the respec- 
tive values of the equivalent discrete-time channel. It is also apparent that when the 
number of taps N in the channel estimator is greater than or equal to L + 1, the optimum 
tap gain coefficients {f k } are equal to the respective values of the {/)}, even when the 
information sequence is correlated. Subject to the above conditions, the minimum MSE 
is simply equal to the noise variance /V (l . 

In the above discussion, the estimated information sequence at the output of the 
Viterbi algorithm or the probabilistic symbol-by-symbol algorithm was used in making 
adjustments of the channel estimator. For start-up operation, one may send a short 
training sequence to perform the initial adjustment of the tap coefficients, as is usually 
done in the case of the linear transversal equalizer. In an adaptive mode of operation, 
the receiver simply uses its own decisions to form an error signal. 


■ 10.2 

ADAPTIVE DECISION-FEEDBACK EQUALIZER 

As in the case of the linear adaptive equalizer, the coefficients of the feedforward 
filter and the feedback filter in a decision-feedback equalizer (DFE) may be adjusted 
recursively, instead of inverting a matrix as implied by Equation 9.5-3. Based on the 
minimization of the MSE at the output of the DFE, the steepest-descent algorithm takes 
the form 

C k+1 = C k + AE (e k V* k ) (10.2-1) 

where C k is the vector of equalizer coefficients in the kth signal interval, E (e k V*.) is the 
cross correlation of the error signal s k = I k — I k with V k = [v k+ K l ■ ■ ■ v k I k ~ i • • • h-K 2 V > 
representing the signal values in the feedforward and feedback filters at time t = kT. 
The MSE is minimized when the cross-correlation vector E (e k V'l) = 0 as k — >■ oo. 

Since the exact cross-correlation vector is unknown at any time instant, we use 
as an estimate the vector e k V* k and average out the noise in the estimate through the 
recursive equation 

Cjt+i = C k + As k V* k 

This is the LMS algorithm for the DFE. 


( 10 . 2 - 2 ) 
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FIGURE 10.2-1 

Decision-feedback equalizer. 


As in the case of a linear equalizer, we may use a training sequence to adjust the 
coefficients of the DFE initially. Upon convergence to the (near-) optimum coefficients 
(minimum MSE), we may switch to a decision-directed mode where the decisions at 
the output of the detector are used in forming the error signal e k and fed to the feedback 
filter. This is the adaptive mode of the DFE. which is illustrated in Figure 10.2-1. In 
this case, the recursive equation for adjusting the equalizer coefficient is 

C k+l = C k + As k Vl (10.2-3) 


where s k = l k - I k and V k = [v k+Kl ■ ■ ■ v k /*_i • • • l k -K 1 \’ ■ 

The performance characteristics of the LMS algorithm for the DFE are basically 
the same as the development given in Sections 10. 1-3 and 10. for the linear adaptive 
equalizer. 


■ 10.3 

ADAPTIVE EQUALIZATION OF TRELLIS-CODED SIGNALS 

Bandwidth efficient trellis-coded modulation that was described in Section 8.12 is fre- 
quently used in digital communications over telephone channels to reduce the required 
SNR per bit for achieving a specified error rate. Channel distortion of the trellis-coded 
signal forces us to use adaptive equalization in order to reduce the intersymbol inter- 
ference. The output of the equalizer is then fed to the Viterbi decoder, which performs 
soft-decision decoding of the trellis-coded signal. 
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FIGURE 10.3-1 

Adjustment of equalizer based on 
tentative decisions. 


The question that arises regarding such a receiver is, how do we adapt the equalizer 
in a data transmission mode? One possibility is to have the equalizer make its own 
decisions at its output solely for the purpose of generating an error signal for adj usting its 
tap coefficients, as shown in the block diagram in Figure 10.3-1 . The problem with this 
approach is that such decisions are generally unreliable, since the pre-decoding coded 
symbol SNR is relatively low. A high error rate would cause a significant degradation 
in the operation of the equalizer, which would ultimately affect the reliability of the 
decisions at the output of the decoder. The more desirable alternative is to use the 
post-decoding decisions from the Viterbi decoder, which are much more reliable, to 
continuously adapt the equalizer. This approach is certainly preferable and viable when 
a linear equalizer is used prior to the Viterbi decoder. The decoding delay inherent in 
the Viterbi decoder can be overcome by introducing an identical delay in the tap weight 
adjustment of the equalizer coefficients as shown in Figure 10.3-2. The major price that 
must be paid for the added delay is that the step-size parameter in the LMS algorithm 
must be reduced, as described by Long et al. (1987, 1989), in order to achieve stability 
in the algorithm. 

In channels with severe ISI, the linear equalizer is no longer adequate for com- 
pensating the channel intersymbol interference. Instead, we would like to use a DFE. 
But the DFE requires reliable decisions in its feedback filter in order to cancel out 
the intersymbol interference from previously detected symbols. Tentative decisions 
prior to decoding would be highly unreliable and, hence, inappropriate. Unfortunately, 
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FIGURE 10.3-2 

Adjustment of equalizer based on decisions from the Viterbi decoder. 
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FIGURE 10.3-3 

Use of predictive DFE with interleaving and trellis-coded modulation. 


the conventional DFE cannot be cascaded with the Viterbi algorithm in which post- 
decoding decisions from the decoder are fed back to the DFE. 

One alternative is to use the predictive DFE described in Section 9.5-3. In order 
to accommodate for the decoding delay as it affects the linear predictor, we introduce 
a periodic interleaver/deinterleaver pair that has the same delay as the Viterbi decoder 
and, thus, makes it possible to generate the appropriate error signal to the predictor as 
illustrated in the block diagram of Figure 10.3-3. The way in which a predictive DFE 
can be combined with Viterbi decoding to equalize trellis-coded signals is described and 
analyzed by Eyuboglu (1988). This same idea has been carried over to the equalization 
of fading multipath channels by Zhou et al. (1988, 1990), but the structure of the DFE 
was modified to use recursive least-squares lattice-type biters, which provide faster 
adaptation to the time variations encountered in the channel. 

Another approach that is effective in wireline channels, where the channel impulse 
response is essentially time invariant, is to place the feedback section of the DFE at the 
transmitter and, thus, eliminate the tail (postcursors) of the channel response prior to 
transmission. This is the approach previously described in Section 9.5-4, in which the 
information sequence is precoded using the Tomlinson-Harashima precoding scheme. 
Generally, this approach is implemented by sending a channel probe signal to measure 
the channel frequency or impulse response at the receiver and, thus, to inform the 
transmitter of the channel response in order to synthesize the precoder. An adaptive, 
fractionally spaced linear equalizer is implemented at the receiver, which serves as the 
feedforward biter of the DFE and, thus, compensates for any small time variations in 
the channel response. 

Reduced-state Viterbi detection algorithms From a performance viewpoint, the 
best method for detecting a TCM signal sequence that is corrupted by ISI is to model 
the ISI and the trellis code jointly by a single bnite state machine and to use the 
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i AWGN 

Vk='Zfh-i + Vk 

i = 0 


FIGURE 10.3-4 

Model of TCM and ISI channel. 


Viterbi algorithm on the combined trellis, as described in the papers by Chevillat and 
Eleftheriou (1988, 1989), Eyuboglu et al. (1988, 1989), and Wesolowski (1987b). By 
using a whitened matched filter (WMF) as described previously for the receiver front 
end, the model for the combined trellis encoder and ISI channel filter is illustrated in 
Figure 10.3 — 4, where the channel filter F(z) is minimum phase. Thus, a TCM encoder 
that has S states and employs a signal constellation with 2 m+1 signal points has a 
combined TCM/ISI trellis that has S2 mL states and 2"' transitions (branches) emerging 
from each state. The states of the combined finite state machine may be denoted as 

S n = ( I„-L , In-L+U • • • , 4,-1, ft,) (10.3-1) 


where {/„} is the information symbol sequence and where ft, is the encoder state. 

The Viterbi decoder operates on the combined ISI and code trellis in the conven- 
tional way, by computing the branch metrics 


L 

Vk-^2 fih-i 
;=o 


2 


(10.3-2) 


and incrementing the corresponding path metrics. 

Clearly, the complexity of the Viterbi detector becomes prohibitively large when 
the span L of the ISI is large. In such a case, the decoder complexity can be reduced 
as described in Section 9.6, by truncating the effective channel memory to Lo terms. 
With truncation, the combined TCM/ISI trellis has the S2' nL ° states 

S„ i0 = dn-L 0 , 4,-io+l, • • • , 4,-1, ft,) (10.3-3) 


where 1 < Lo < L. 

Thus, when Lo = 1, the Viterbi algorithm operates directly on the TCM coded 
trellis and the L ISI terms are estimated and canceled. By selecting Lo > 1, some 
ISI terms are kept while L + 1 — L () terms are canceled. To reduce the performance 
degradation due to tentative decisions in the Viterbi detector, the ISI cancelation is 
introduced into the branch metric computations using local feedback, as previously 
described in Section 9.6. Thus, the branch metrics computed in the Viterbi detector 
take the form 


Lq—\ L + 1 

Vk-Y, fih-i -Y, M *-i(Sn°) 

1=0 i=Lo 


(10.3-4) 


wher slk-i («Sft°) denotes the estimated ISI term due to the symbols {4t_;, Lo < i < L) 
involved in the truncation of the ISI based on local feedback. 
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In the case of an unknown channel characteristic, both the WMF and the channel 
estimator of F(z) must be determined adaptively. This may be accomplished by adapt- 
ing a complex- valued baseband FSE for the WMF and the channel estimator described 
previously in Section 10.1-7. Thus, a training sequence may be used for initial ad- 
justment and decision-directed estimation may continue following the initial training 
sequence. The LMS algorithm may be used in both the training and decision-directed 
modes. Simulation results given by Chevillat and Eleftheriou (1989) demonstrate the 
superior performance of this adaptive WMF/reduced-state Viterbi detector compared 
to the combination of a linear equalizer followed by a Viterbi detector. 


■ 10.4 

RECURSIVE LEAST-SQUARES ALGORITHMS 
FOR ADAPTIVE EQUALIZATION 

The LMS algorithm that we described in Sections 10. 1 and 10.2 for adaptively adjusting 
the tap coefficients of a linear equalizer or a DFE is basically a (stochastic) steepest- 
descent algorithm in which the true gradient vector is approximated by an estimate 
obtained directly from the data. 

The major advantage of the steepest-descent algorithm lies in its computational 
simplicity. However, the price paid for the simplicity is slow convergence, especially 
when the channel characteristics result in an autocorrelation matrix F whose eigen- 
values have a large spread, i.e., A max //- mm 3> 1. Viewed in another way, the gradient 
algorithm has only a single adjustable parameter for controlling the convergence rate, 
namely, the parameter A. Consequently the slow convergence is due to this fundamen- 
tal limitation. Two simple methods for increasing the convergence rate to some extent 
were described in Section 10.1-5. 

In order to obtain faster convergence, it is necessary to devise more complex algo- 
rithms involving additional parameters. In particular, if the matrix F is N x N and has 
eigenvalues A-i , A. 2 , . . . , A.#, we may use an algorithm that contains N parameters — one 
for each of the eigenvalues. The optimum selection of these parameters to achieve rapid 
convergence is a topic of this section. 

In deriving faster converging algorithms, we shall adopt a least-squares approach. 
Thus, we shall deal directly with the received data in minimizing the quadratic per- 
formance index, whereas previously we minimized the expected value of the squared 
error. Put simply, this means that the performance index is expressed in terms of a time 
average instead of a statistical average. 

It is convenient to express the recursive least-squares algorithms in matrix form. 
Hence, we shall define a number of vectors and matrices that are needed in this devel- 
opment. In so doing, we shall change the notation slightly. Specifically, the estimate of 
the information symbol at time t, where t is an integer, from a linear equalizer is now 
expressed as 


K 

m = E - ^-j 


i=-K 
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By changing the index j on cj(t — 1 ) to run from j = Oto j = N — 1 and simultaneously 
defining 

y(t) = v t+K 

the estimate 7(f) becomes 


N - 1 


Ht) = c i( t ~ - j) 

7=0 


= C f N (t - 1 )Y N (t) 


(10.4-1) 


where C,v(f — 1) and Y N (t) are, respectively, the column vectors of the equalizer 
coefficients cj(t — 1), j = 0, 1, . . . , N — 1, and the input signals y(t — j), j = 
0, 1,2, ..., N - 1. 

Similarly, in the decision-feedback equalizer, we have tap coefficients j = 
0, 1, . . . , N — 1, where the first K \ + 1 are the coefficients of the feedforward filter 
and the remaining A4 = N — K\ — \ arc the coefficients of the feedback filter. The data in 
the estimate /(/) is v t+K ,, ■ ■ ■ , v t +i, I t -i, ■ ■ ■ , It-K 2 > where l t -j, 1 < j < K 2 , denote 
the decisions on previously detected symbols. In this development, we neglect the effect 
of decision errors in the algorithms. Hence, we assume that I,-j = It-j , 1 <j< k 2 . 
For notational convenience, we also define 


Thus, 


y(t - j) = 


Vt+Ki-j (0 < j < Ki) 

I t+Kl -j < j < N — 1) 


rjv(0 = [y(0 y(7- l)---y(f-iV + l)] r 

= [Vt+Ki ■ ■ ■ Or+l V, 7,-1 • • • U-kJ 


(10.4-2) 


(10.4-3) 


10.4-1 Recursive Least-Squares (Kalman) Algorithm 

The recursive least-squares (RLS) estimation of 7(f) may be formulated as follows. 
Suppose we have observed the vectors Y ,\(n). n = 0, 1, . . . , f, and we wish to deter- 
mine the coefficient vector C^(f) of the equalizer (linear or decision-feedback) that 
minimizes the time-average weighted squared error 

t 

4 s = E m/ “" !**(«, Ol 2 d 0 - 4 - 4 ) 

n = 0 

where the error is defined as 

e N (n, t ) = I(n) - C‘ N (t)Y N (n ) (10.4-5) 

and w represents a weighting factor 0 < w < 1. Thus we introduce exponential 
weighting into past data, which is appropriate when the channel characteristics are 
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time- variant. Minimization of £' N S with respect to the coefficient vector C \<{t) yields 
the set of linear equations 


R N (t)C N (t ) = D N (t) (10.4-6) 

where RnW is the signal correlation matrix defined as 

t 

RnH ) = E w t - n Y* N (n)Y t N (n) (10.4-7) 

n = 0 

and Djy(t) is the cross-correlation vector 

t 

D N (t ) = Y, ^~ n l{n)Y* N (n) (10.4-8) 

n = 0 

The solution of Equation 10.4-6 is 

C N (t) = R^(t)D N (t) (10.4-9) 

The matrix R N (t) is akin to the statistical autocorrelation matrix E while the vector 
T>iv(f) is akin to the cross-correlation vector £, defined previously. We emphasize, 
however, that R ,v(t) is not a Toeplitz matrix. We also should mention that, for small 
values of t, R ,\r(t) may be ill conditioned; hence, it is customary to initially add the 
matrix SI N to R^(t), where 5 is a small positive constant and / v is the identity matrix. 
With exponential weighting into the past, the effect of adding SI N dissipates with time. 

Now suppose we have the solution in Equation 10.4-9 for time t— 1, i.e., C^{t— 1), 
and we wish to compute C #(?). It is inefficient, and, hence, impractical to solve the set 
of N linear equations for each new signal component that is received. To avoid this, we 
proceed as follows. First, R v (r) may be computed recursively as 

R N (t) = wR N (t - 1) + Y* N (t)Y‘ N (t) (10.4-10) 


We call Equation 10.4-10 the time-update equation for /?at(0- 

Since the inverse of R N (t) is needed in Equation 10.4-9, we use the matrix-inverse 
identity 


R N \t) 



R N \t - l)Y* N (t)Y r N (t)R- N \t - 1) 


(10.4-11) 


Thus R N l (t) may be computed recursively according to Equation 10.4-1 1. 

For convenience, we define 7*^(0 = R]f(t). It is also convenient to define an 
iV -dimensional vector, called the Kalman gain vector , as 

K N (t) = 1 P N (t - l)Y* N (t) (10.4-12) 

w+ p. N (t) 

where /x^(/) is a scalar defined as 

Hn( t) = Y' N (t)P N (t - 1)F^(0 


(10.4-13) 
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With these definitions, Equation 10.4-1 1 becomes 

P N (t) = ~[P N (t - 1) - K N (t)Y' N {t)P N (t - 1)] 
w 


(10.4-14) 


Suppose we postmultiply both sides of Equation 10.4-14 by Y* N (t). Then 


P N (t)Y* N (t) = ~[P N (t - l)F^(f) - K N (t)Y‘ N (t)P N (t - l)F^(f)] 
w 


= — {[w + /zy(f )]K N (t) — K N (t)fi N (t)} 
w 

= K N ( t ) 


(10.4-15) 


Therefore, the Kalman gain vector may also be defined as P^{t)Y y(Y). 

Now we use the matrix inversion identity to derive an equation for obtaining Cy(t) 
from C y(f — 1). Since 


and 


we have 


Cy(0 = P N(t)D N (t) 

D N (t ) = wD N (t - 1) + I(t)Y* N (t) 


(10.4-16) 


1 


C N (t) = -[P N (t - 1) - K N (t)Y r N (t)P N (t - 1 )][wD N (t - 1) + I(t)Y* (f)] 


w 


1 


= P N (t - 1 )D N {t - 1) + -I(t)P N (t - l)Ty(0 

w 

-K N (t)Y' N (t)P N (t-l)D N (t-l) 

- — I(t)K N {t)Y* N (t)P N {t - 1)^(0 
w 

= C N (t - 1) + K N (t)[I(t) - Y l N (t)C N (t - 1)] 

Note that F^(t)Cy(f — 1) is the output of the equalizer at time t, i.e., 

/(o = r N {t)c N (t - 1) 

and 

e N (t, t - 1) = 7(f) - 7(f) = e N (t ) 


(10.4-17) 


(10.4-18) 


(10.4-19) 


is the etTor between the desired symbol and the estimate. Hence, C y(f ) is updated 
recursively according to the relation 


Cjv(0 = C y(f — 1) + Ky(f)ey(f) 
The residual MSE resulting from this optimization is 


(10.4-20) 


4L„ = E wr “"i / wi 2 - c ^on(o 


17=0 


(10.4-21) 
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To summarize, suppose we have C^ft — 1) and P ^(t — 1). When a new signal 
component is received, we have Y N (t). Then the recursive computation for the time 
update of C ^(t) and P^{t) proceeds as follows: 

• Compute output: 

m = Y r N {t)c N {t - 1 ) 

• Compute error: 

e N {t) = m - 7(0 


• Compute Kalman gain vector: 


K N {t) = 


P N (t - DYUt) 


w + Y l N (t)P N (t — l)F^(r) 
Update inverse of the correlation matrix: 

P N (t) = -[P N (t - 1) - K N (t)Y‘ N (t)P N (t - 1)] 
w 

Update coefficients: 


Cn( 0 = Cjv(f — 1) + K N (t)eN(t) 

= C N (t - 1) + P N (t)Y* N (t)e N (t) 


(10.4-22) 


The algorithm described by Equation 10.4—22 is called the RLS direct form or 
Kalman algorithm. It is appropriate when the equalizer has a transversal (direct- 
form) structure. 

Note that the equalizer coefficients change with time by an amount equal to the error 
e^(/) multipled by the Kalman gain vector K ^(/). Since K j^{t) is iV-dimensional, each 
tap coefficient in effect is controlled by one of the elements of K N (t). Consequently 
rapid convergence is obtained. In contrast, the steepest-descent algorithm, expressed in 
our present notation, is 

C N {t) = C N {t - 1) + AY* N (t)e N (t ) (10.4-23) 

and the only variable parameter is the step size A. 

Figure 10.4—1 illustrates the initial convergence rate of these two algorithms for a 
channel with fixed parameters /o = 0.26, f\ = 0.93, fo = 0.26, and a linear equalizer 
with 1 1 taps. The eigenvalue ratio for this channel is 7- max /7. mm =11. All the equalizer 
coefficients were initialized to zero. The steepest-descent algorithm was implemented 
with A = 0.020. The superiority of the Kalman algorithm is clearly evident. This is 
especially important in a time- variant channel. For example, the time variations in the 
characteristics of an (ionospheric) high-frequency (HF) radio channel are too rapid to 
be equalized by the gradient algorithm, but the Kalman algorithm adapts sufficiently 
rapidly to track such variations. 

In spite of its superior convergence performance, the Kalman algorithm described 
above has two disadvantages. One is its complexity. The second is its sensitivity to 


Chapter Ten: Adaptive Equalization 


715 



FIGURE 10.4-1 

Comparison of convergence rate for the 
Kalman and gradient algorithms. 


roundoff noise that accumulates due to the recursive computations. The latter may 
cause instabilities in the algorithm. 

The number of computations or operations (multiplications, divisions, and sub- 
tractions) in computing the variables in Equation 10.4-22 is proportional to N 2 . Most 
of these operations are involved in the updating of P ^ (t). This part of the computation 
is also susceptible to roundoff noise. To remedy that problem, algorithms have been 
developed that avoid the computation of P^(t ) according to Equation 10.4-14. The 
basis of these algorithms lies in the decomposition of P N ( t) in the form 

P N (t) = S N (t)A N (t)S' N (t) (10.4-24) 

where Sjv (0 is a lower-triangular matrix whose diagonal elements are unity, and A N (t) 
is a diagonal matrix. Such a decomposition is called a square-root factorization (see 
Bierman, 1977). This factorization is described in Appendix D. In a square-root algo- 
rithm, P N (/) is not updated as in Equation 10.4-14 nor is it computed. Instead, the time 
updating is performed on Sjv( 0 and A N (t). 

Square-root algorithms are frequently used in control systems applications in which 
Kalman filtering is involved. In digital communications, the square-root Kalman algo- 
rithm has been implemented in a decision-feedback-equalized PSK modem designed 
to transmit at high speed over high-frequency radio channels with a nominal 3-kHz 
bandwidth. This algorithm is described in the paper by Hsu (1982). It has a computa- 
tional complexity of 1.5 A 2 + 6.5 N (complex- valued multiplications and divisions per 
output symbol). It is also numerically stable and exhibits good numerical properties. 
For a detailed discussion of square-root algorithms in sequential estimation, the reader 
is referred to the book by Bierman (1977). 

It is also possible to derive RLS algorithms with computational complexities that 
grow linearly with the number N of equalizer coefficients. Such algorithms are generally 
called fast RLS algorithms and have been described in the papers by Carayannis et al. 
(1983), Cioffi and Kailath (1984), and Slock and Kailath (1991). 

Another class of recursive least squares algorithms for adaptive equalization are 
based on the lattice equalizer structure. Below, we derive the lattice filter structure 
from the transversal filter structure and, thus, demonstrate the equivalence of the two 
structures. 
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10.4-2 Linear Prediction and the Lattice Filter 

In this section we develop the connection between a linear FIR iilter and a lattice 
iilter. This connection is most easily established by considering the problem of linear 
prediction of a signal sequence. 

The linear prediction problem may be stated as follows: given a set of data 
y(t — 1), y(t — 2), . . . , y(t — p), predict the value of the next data point y(t). The 
predictor of order p is 


y(t) = J2 a pky(t ~ k ) 


(10.4-25) 


k= 1 


Minimization of the MSE, defined as 

£ P = E[y(t ) - y(t)] 2 


= E 


y(t) - a pky(t ~ k ) 


k = 1 


(10.4-26) 


with respect to the predictor coefficients {a pk } yields the set of linear equations 


Y j a pk R(k-l) = R(l ), 1 = 1,2,..., 


(10.4-27) 


k= 1 


where 


R(l) = E[y(t)y(t + /)] 


These are called the normal equations or the Yule-Walker equations. 

The matrix R with elements R{k — l) is a Toeplitz matrix, and, hence, the Levinson- 
Durbin algorithm provides an efficient means for solving the linear equations recur- 
sively, starting with a first-order predictor and proceeding recursively to the solution of 
the coefficients for the predictor of order p. The recursive relations for the Levinson- 
Durbin algorithm are (see Levinson (1947) and Durbin (1959)) 


flu 

Q-mm 


R( 1) 
R(0)’ 


So = R( 0) 


0(m) - A^R 


r 

m— 1 


£ 


-1 


&mk — &m—lk ^nim^m—hn—k 


Sill — £ m - l(l a rnm) 


(10.4-28) 


for m = 1,2, ... , p, where the vectors A m _i and R' m , are defined as 
Am— 1 [^m— 11 ®m — l 2 * ‘ ' — 

R r m -1 = lR(m - 1 ) R(m - 2 ) • • ■ R(1)Y 
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The linear prediction filter of order m may be realized as a transversal (FIR) filter 
with transfer function 


Its input is the data (y(f)} and its output is the error e(t) = y(t) — y(t). The prediction 
filter can also be realized in the form of a lattice, as we now demonstrate. 

Our starting point is the use of the Levinson-Durbin algorithm for the predictor 
coefficients ci m k in Equation 10.4-29. This substitution yields 


Thus we have the transfer function of the /nth-order predictor in terms of the transfer 
function of the (m — 1 )th-order predictor. 

Now suppose we define a filter with transfer function G m (z) as 


Note that G m |(") represents a transversal filter with tap coefficients (— a m _ i m _i, 
—a m -\ m - 2 , ■ ■ . , —a m - ii, 1), while the coefficients of A m _ i (z) are exactly the same 
except that they are given in reverse order. 

More insight into the relationship between A m (z) and G m (z) can be obtained by 
computing the output of these two biters to an input sequence y(t). Using z -transform 
relations, we have 


m 


Am(.Z^ — 1 J2 a »iZ 


(10.4-29) 


m— 1 




k= 1 
m — 1 


m— 1 


(10.4-30) 


1 ^ ' C^m — lkZ 


k= 1 


G m (z ) = z~ m A m (z~ l ) 

Then Equation 10.4-30 may be expressed as 

Am(.Z) — A m _i(z) Cl mm Z Gm— l(z) 


(10.4-31) 


(10.4-32) 


A m (z)Y(z) = A m -i(z)Y(z) - a mm z 1 G m _i(z)T(z) 

We debne the outputs of the biters as 


(10.4-33) 


F m (z) = A m (z)Y(z) 
B m (z) = G m (z)Y(z ) 


(10.4-34) 


Then Equation 10.4-33 becomes 

( z) — F m — i(z) ci mm z B m — i(z) 

In the time domain, the relation in Equation 10.4-35 becomes 


(10.4-35) 


,/m (0 — CL mm b in — \{t 1), III ^ 1 


(10.4-36) 
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fm(t) = y(t ) - ]T a mk y(t - k ) (10.4-37) 

k= 1 

m — 1 

bm(t ) = yit - m) - ^2 a mky(t - m + k ) (10.4-38) 

k=l 

To elaborate, f m (t) in Equation 10.4—37 represents the error of an /nth-order forward 
predictor, while b m (t) represents the error of an /wth-order backward predictor. 

The relation in Equation 10.4-36 is one of two that specifies a lattice filter. The 
second relation is obtained from G m (z ) as follows: 

G m (z) = z~" l A m (z~ 1 ) 

= z-'”[A m _ 1 (z- 1 ) - a mm z m A m _ 1 (z)] (10.4-39) 

= Z Gm-l(z) 

Now, if we multiply both sides of Equation 10.4-39 by Y(z ) and express the result in 
terms of F m (z) and B m (z) using the definitions in Equation 10.4-34, we obtain 

B m (z) = z~ l B m -i(z) - a mm F m -i(z) (10.4-40) 

By transforming Equation 10.4-40 into the time domain, we obtain the second relation 
that corresponds to the lattice filter, namely, 

MO = b m —\(t - 1) - m > 1 (10.4-41) 

The initial condition is 


/o(0 = b 0 (t ) = y(t) (10.4-42) 

The lattice filter described by the recursive relations in Equations 10.4—36 and 10.4^11 
is illustrated in Figure 10.4-2. Each stage is characterized by its own multiplication 
factor {a,,}, i = 1,2 , ,m, which is defined in the Levinson-Durbin algorithm. The 
forward and backward errors f m (t) and ( t ) are usually called the residuals. The mean 

square value of these residuals is 

8 m = E = E [b 2 m (t)\ (10.4-43) 



(a) (b) 


FIGURE 10.4-2 

A lattice filter. 


Chapter Ten: Adaptive Equalization 


719 


£ m is given recursively, as indicated in the Levinson-Durbin algorithm, by 



m 


(10.4-44) 




where £q = R( 0). 

The residuals {/„,(0} and \b m (t)\ satisfy a number of interesting properties, as 
described by Makhoul (1978). Most important of these are the orthogonality properties 

E[b m (t)b n (t)] 

— £m^mn (10 4 45 ) 

E[f m (t + m)f n (t + n)\ = £ m S mn 
Furthermore, the cross correlation between f m (t) and b n (t) is 


As a consequence of the orthogonality properties of the residuals, the different 
sections of the lattice exhibit a form of independence that allows us to add or delete 
one or more of the last stages without affecting the parameters of the remaining stages. 
Since the residual mean square error £ m decreases monotonically with the number of 
sections, £ m can be used as a performance index in determining where the lattice should 
be terminated. 

From the above discussion, we observe that a linear prediction filter can be im- 
plemented either as a linear transversal filter or as a lattice filter. The lattice filter is 
order-recursive, and, as a consequence, the number of sections it contains can be easily 
increased or decreased without affecting the parameters of the remaining sections. In 
contrast, the coefficients of a transversal filter obtained on the basis of the RLS criterion 
are interdependent. This means that an increase or a decrease in the size of the filter 
results in a change in all coefficients. Consequently, the Kalman algorithm described 
in Section 10.4-1 is recursive in time but not in order. 

Based on least-squares optimization, RLS lattice equalization algorithms have 
been developed whose computational complexity grows linearly with the number N 
of filter coefficients (lattice stages). Hence, the lattice equalizer structure is compu- 
tationally competitive with the direct-form fast RLS equalizer algorithms. For exam- 
ple, Figure 10.4—3 illustrates the computational complexity (number of multiplications 
and divisions per output symbol) of transversal and lattice, symbol-spaced DFE filter 
structures. Observe that for equalizer lengths of fewer than 10 taps, the difference in 
computational complexity among the different structures and algorithms is relatively 
small. However, as the number of taps increases, the lattice RLS algorithm and the fast 
(transversal) RLS algorithm are significantly less complex than the conventional and 
square-root RLS algorithms. Of course, all the RLS algorithms are computationally 
more complex than the LMS algorithm. RLS lattice algorithms are described in the 
papers by Morf (1977), Morf and Lee (1978), and Morf et al. (1977a, b,c), Satorius and 
Alexander (1979), Satorius and Pack (1981), Ling and Proakis (1982, 1984c, 1985, 
1986) and in the books by Proakis et al. (2002) and Haykin (2002). 



m > n 


m < n 


m, n > 0 


(10.4-46) 
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FIGURE 10.4-3 

Computational complexity of DFE algorithms. 



LMS 

Gradient 

LMS 

Gradient 

LMS 

RLS 

RLS 

RLS 

RLS 

RLS 

Fast RLS 


Fast RLS 


Fast RLS 

Square-root RLS 


Square-root RLS 


Square-root RLS 


FIGURE 10.4-4 

Equalizer types, structures, and algorithms. 
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RLS lattice algorithms have the distinct feature of being numerically robust to 
round-off error inherent in digital implementations of the algorithms. A treatment of 
their numerical properties may be found in the papers by Ling and Proakis (1984a) and 
Ling et al. (1986a, b). 

Figure 10.4—4 illustrater the different types of linear and nonlinear equalizers the 
corresponding structures for their implementation, and the adaptive algorithms that 
may be used to adjust the equalizer coefficients. 


In the conventional zero-forcing or minimum MSE equalizers, we assumed that a known 
training sequence is transmitted to the receiver for the purpose of initially adjusting 
the equalizer coefficients. However, there are some applications, such as multipoint 
communication networks, where it is desirable for the receiver to synchronize to the 
received signal and to adjust the equalizer without having a known training sequence 
available. Equalization techniques based on initial adjustment of the coefficients without 
the benefit of a training sequence are said to be self-recovering or blind. 

Beginning with the paper by Sato (1975), three different classes of adaptive blind 
equalization algorithms have been developed over the past three decades. One class of 
algorithms is based on steepest descent for adaptation of the equalizer. A second class 
of algorithms is based on the use of second- and higher-order (generally, fourth-order) 
statistics of the received signal to estimate the channel characteristics and to design 
the equalizer. More recently, a third class of blind equalization algorithms based on 
the maximum-likelihood criterion have been investigated. In this section, we briefly 
describe these approaches and give several relevant references to the literature. 


10.5-1 Blind Equalization Based on the Maximum-Likelihood Criterion 

It is convenient to use the equivalent, discrete-time channel model described in Sec- 
tion 9.3-2. Recall that the output of this channel model with ISI is 


where {fk} are the equivalent discrete-time channel coefficients, {/„} represents the 
information sequence, and { 77 ,,} is a white Gaussian noise sequence. 

For a block of N received data points, the (joint) probability density function of 
the received data vector v = [ rj 1 i >2 ••• u,v \' conditioned on knowing the impulse 
response vector f = [f 0 f • • • f L ]' and the data vector / = [/) / 2 • • • I^Y is 


■ 10.5 

SELF-RECOVERING (BLIND) EQUALIZATION 


L 



(10.5-1) 



(10.5-2) 
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The joint maximum-likelihood estimates of / and I are the values of these vectors 
that maximize the joint probability density function p(v\f , /) or, equivalently, the 
values of / and I that minimize the term in the exponent. Hence, the ML solution is 
simply the minimum over / and I of the metric 


N 

DM(I, f) = J2 

n = 1 


V n 


yi fkh-k 

k=0 


= II®- A/|| 2 


(10.5-3) 


where the matrix A is called the data matrix and is defined as 



'h 

0 

0 

0 


h 

h 

0 

0 

A = 

h 

h 

h ■ 

0 


Jn 

In-i 

In-2 ■ 

■ In-l _ 


(10.5-4) 


We make several observations. First of all, we note that when the data vector I 
(or the data matrix A) is known, as is the case when a training sequence is available 
at the receiver, the ML channel impulse response estimate obtained by minimizing 
Equation 10.5-3 over / is 


f ML (I) = (A H A)- l A H v (10.5-5) 

On the other hand, when the channel impulse response / is known, the optimum 
ML detector for the data sequence I performs a trellis search (or tree search) by utilizing 
the Viterbi algorithm for the ISI channel. 

When neither I nor / are known, the minimization of the performance index 
DM {I, f) may be performed jointly over I and /. Alternatively, / may be estimated 
from the probability density function p(v\f), which may be obtained by averaging 
p(v, f\I ) over all possible data sequences. That is, 

P {v\f) = Y J p( v ^ (m) \f) 

" (10.5-6) 

= 5>(»|/ (m >, /)P(/ (m >) 

m 

where P(I (,n) ) is the probability of the sequence I = 7 (m) , for m = 1,2,..., M N , and 
M is the size of the signal constellation. 

Channel estimation based on average over data sequences As indicated in the 
above discussion, when both I and / are unknown, one approach is to estimate the 
impulse response / after averaging the probability density p(v, I\f) over all possible 
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data sequences. Thus, we have 

P {v\f) = Y J p^\ i(m \f)P{i {m) ) 


= £ 


(2na 2 ) N 


exp 


i iu- A (m) /r 

2a 2 


(10.5-7) 


P(I (m) ) 


Then, the estimate of / that maximizes p(v\f ) is the solution of the equation 

dp(v\f) 


df 


= m (m) ) ■ 


_ A 0n)H v}exp | = Q 


(10.5-8) 


2a 2 


Hence, the estimate of / may be expressed as 


/ = 


P(I {m) )A Un)H A im) g( u, A (m) , /) 

m 

c y] P(I (m) )g(v, A (m \ f)A (m)H v 


-l 


(10.5-9) 


where the function g(v, A l '"\ f) is defined as 

(10.5-10) 

The resulting solution for the optimum / is denoted by / ML . 

Equation 10.5-9 is a non-linear equation for the estimate of the channel impulse re- 
sponse, given the received signal vector v. It is generally difficult to obtain the optimum 
solution by solving Equation 10.5-9 directly. On the other hand, it is relatively simple to 
devise a numerical method that solves for f ML recursively. Specifically, we may write 


g(v, A (m) , f ) = exp - 


2a 2 


f{k+\) = 


y] P(I (m) )A {m)H A {m) g( V, A (m) , / w ) 


x£^(/ (m) )g(v, A (m) , 


(10.5-11) 


Once f ML is obtained from the solution of Equation 10.5-9 or 10.5-1 1 , we may simply 
use the estimate in the minimization of the metric DM(I , f ML ), given by Equation 
10.5-3, over all the possible data sequences. Thus, I ml is the sequence I that minimizes 
DM(I , f ML ), i.e., 

min DM(I, f ML ) = min ||t> - Af ML \\ 2 (10.5-12) 

We know that the Viterbi algorithm is the computationally efficient algorithm for per- 
forming the minimization of DM(I, f ML ) over I . 


724 


Digital Communications 


This algorithm has two major drawbacks. First, the recursion for f LM given by 
Equation 10.5-1 1 is computationally intensive. Second, and, perhaps, more importantly, 
the estimate f ML is not as good as the maximum-likelihood estimate f MI (I) that is 
obtained when the sequence I is known. Consequently, the error rate performance of 
the blind equalizer (the Viterbi algorithm) based on the estimate f ML is poorer than 
that based on / Next, we consider joint channel and data estimation. 

Joint channel and data estimation Here, we consider the joint optimization of 
the performance index DM(I, /) given by Equation 10.5-3. Since the elements of the 
impulse response vector / are continuous and the elements of the data vector I are 
discrete, one approach is to determine the maximum-likelihood estimate of / for each 
possible data sequence and, then, to select the data sequence that minimizes DM (I, f) 
for each corresponding channel estimate. Thus, the channel estimate corresponding to 
the with data sequence I im) is 

f ML (I {m] ) = (A (m), A <m) )- ] A <m>t v (10.5-13) 

For the mth data sequence, the metric DM (I, f) becomes 

DM [J (m) , f ML (I im) )\ = ||« - A (m) f ML (I (m) ) || 2 (10.5-14) 

Then, from the set ofM N possible sequences, we select the data sequence that minimizes 
the cost function in Equation 10.5-14, i.e., we determine 

min DM [ 7 (m) , f ML (I {m) )\ (10.5-15) 

j(m) 

The approach described above is an exhaustive computational search method with 
a computational complexity that grows exponentially with the length of the data block. 
We may select N = L + 1 , and, thus, we shall have one channel estimate for each of the 
M l surviving sequences. Thereafter, we may continue to maintain a separate channel 
estimate for each surviving path of the Viterbi algorithm search through the trellis. This 
approach to joint channel and data estimation has been called per-survivor processing 
by Raheli et al. (1995). 

A similar approach has been proposed by Seshadri (1994). In essence, Seshadri’s 
algorithm is a type of generalized Viterbi algorithm (GVA) that retains K > 1 best esti- 
mates of the transmitted data sequence into each state of the trellis and the corresponding 
channel estimates. In Seshadri’s GVA, the search is identical to the conventional Viterbi 
algorithm (VA) from the beginning up to the Lth stage of the trellis, i.e., up to the point 
where the received sequence (vi, V 2 , ,Vl) has been processed. Hence, up to the Lth 
stage, an exhaustive search is performed. Associated with each data sequence I (m \ 
there is a corresponding channel estimate / Mi (/ (m) ). From this stage on, the search is 
modified, to retain K > 1 surviving sequences and associated channel estimates per 
state instead of only one sequence per state. Thus, the GVA is used for processing the 
received signal sequence [v„,n > L + 1}. The channel estimate is updated recursively 
at each stage using the LMS algorithm to further reduce the computational complex- 
ity. Simulation results given in the paper by Seshadri (1994) indicate that this GVA 
blind equalization algorithm performs rather well at moderate signal-to-noise ratios 
with K = 4. Hence, there is a modest increase in the computational complexity of the 
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GVA compared with that for the conventional VA. However, there are additional com- 
putations involved with the estimation and updating of the channel estimates /(/ (m) ) 
associated with each of the surviving data estimates. 

An alternative joint estimation algorithm that avoids the least-squares computation 
for channel estimation has been devised by Zervas et al. (1991). In this algorithm, 
the order for performing the joint minimization of the performance index DM(I, f) 
is reversed. That is, a channel impulse response, say / = / (1) , is selected and then 
the conventional VA is used to find the optimum sequence for this channel impulse 
response. Then, we may modify / (l1 in some manner to f i2> = / (1, + A/' n and 
repeat the optimization over the data sequences {/ <m) }. 

Based on this general approach, Zervas et al. developed a new ML blind equalization 
algorithm, which is called a quantized- channel algorithm. The algorithm operates over 
a grid in the channel space, which becomes finer and finer by using the ML criterion 
to confine the estimated channel in the neighborhood of the original unknown channel. 
This algorithm leads to an efficient parallel implementation, and its storage requirements 
are only those of the VA. 


10.5-2 Stochastic Gradient Algorithms 

Another class of blind equalization algorithms are stochastic-gradient iterative equal- 
ization schemes that apply a memoryless non-linearity in the output of a linear FIR 
equalization filter in order to generate the “desired response” in each iteration. 

Let us begin with an initial guess of the coefficients of the optimum equalizer, which 
we denote by {c„}. Then, the convolution of the channel response with the equalizer 
response may be expressed as 

{c„}*{/„} = {«„} + {cn} (10.5-16) 

where {5,,} is the unit sample sequence and ( e n } denotes the error sequence that results 
from our initial guess of the equalizer coefficients. If we convolve the equalizer impulse 
response with the received sequence { v „ } , we obtain 

{^«l = { Li } * {ri 7 } 

= {In} * {fn} * {<-« } T {fi«} * {c«} QQ J lyi 

= {In} * «<U + {e n }) + {lj „} ★ {c n } 

= {I?i} + {4i} * {e n } + {Vn} * {rii} 

In Equation 10.5-17 the term {/„} represents the desired data sequence, the term 
{/„} ★ { e n } represents the residual ISI, and the term {rj„} ★ {c„} represents the additive 
noise. Our problem is to utilize the deconvolved sequence {/„ } to find the “best” estimate 
of a desired response, denoted in general by {d,,}. In the case of adaptive equalization 
using a training sequence, {d n } = {/„}. In a blind equalization mode, we shall generate 
a desired response from {/„}. 

The mean square error (MSE) criterion may be employed to determine the “best” 
estimate of { /„ } from the observed equalizer output { /„ } . Since the transmitted sequence 
{/„} has a non-Gaussian PDF, the MSE estimate is a non-linear transformation of {/„}. 
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FIGURE 10.5-1 

Adaptive blind equalization with stochastic 
gradient algorithms. 


In general, the best estimate { d n } is given by 

d„=g(i n ) (memoryless) (10 5-18) 

dn = g(ln - In-1, • • ■ , I n-m) (mth-order memory) 

where g( ) is a non-linear function. The sequence { d n } is then used to generate an error 
signal, which is fed back into the adaptive equalization filter, as shown in Figure 10.5-1 . 
Let us consider the nonlinear function based on the MSE criterion. 

A well-known classical estimation problem is the following. If the equalizer output 
I„ is expressed as 

In = In + Vn (10.5-19) 


where r) n is assumed to be zero-mean Gaussian (the central limit theorem may be 
invoked here for the residual ISI and the additive noise), {/„} and {rj„} are statistically 
independent, and {/„} are statistically independent and identically distributed random 
variables, then the MSE estimate of {/„} is 

d n = E(I n \I n ) (10.5-20) 


which is a non-linear function of the equalizer output when {/„} is non-Gaussian. 

Table 10.5-1 illustrates the general form of existing blind equalization algorithms 
that are based on LMS adaptation. We observe that the basic difference among these 
algorithms lies in the choice of the memoryless non-linearity. The most widely used 
algorithm in practice is the Godard algorithm, sometimes also called the constant- 
modulus algorithm (CMA). 

It is apparent from Table 10.5-1 that the output sequence { d n } obtained by taking 
a non-linear function of the equalizer output plays the role of the desired response or 
a training sequence. It is also apparent that these algorithms are simple to implement, 
since they are basically LMS-type algorithms. As such, we expect that the convergence 
characteristics of these algorithms will depend on the autocorrelation matrix of the 
received data { v n } . 

With regard to convergence, the adaptive LMS-type algorithms converge in the 
mean when 


Vng*(In) 


= E 


v n i : 


(10.5-21) 
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TABLE 10.5-1 

Stochastic Gradient Algorithms for Blind Equalization 


Equalizer tap coefficients 

|c„,0<rc< N- lj 

Received signal sequence 

fr„j 

Equalizer output sequence 

{In} = {^«}*{Cn} 

Equalizer error sequence 

{&n} = n) In 

Tap coefficient update equation 

Cn + 1 = Cn + 

Algorithm 

Non-linearity: g(I„) 


Godard 

Sato 


-f-(|/„l + «2|/„|-|/ n h.«2 = 

I ' n I 


fcsgn (/„), f 


£URe(/„)] 2 } 

£{|Re(/„)|l 


£j|/»| 4 } 

E{\I„\ 2 } 


Benveniste-Goursat 


I n +ki(I„ ~ I„) + k 2 \I„ - I „ |[? csgn (/„) - /„], 
k\ and k 2 are positive constants 


Stop-and-go In + \A{i n - 4) + \B(I n - /„)*, (A, B ) = (2. 0), (1, 1), 

(1, —1), or (0, 0), depending on the signs of decision-directed 
error I „ — I „ and the error f csgn (/„) — I „ 


and, in the mean square sense, when 

E[C”v n g*(I n )] = E[C^v n I*] 
E[i n g*0n )] = £[14 I 2 ] 


(10.5-22) 


Therefore, it is required that the equalizer output {/„} satisfy Equation 10.5-22. 
Note that Equation 10.5-22 states that the autocorrelation of {/„} (the right-hand side) 
equals the cross correlation between /„ and a non-linear transformation of /„ (left-hand 
side). Processes that satisfy this property are called Bussgang (1952), as named by 
Bellini (1986). In summary, the algorithms given in Table 10.5-1 converge when the 
equalizer output sequence /„ satisfies the Bussgang property. 

The basic limitation of stochastic gradient algorithms is their relatively slow con- 
vergence. Some improvement in the convergence rate can be achieved by modifying 
the adaptive algorithms from LMS-type to RLS-type. 


Godard algorithm The Godard blind equalization algorithm is a steepest-descent 
algorithm that is widely used in practice when a training sequence is not available. 
Let us describe this algorithm in more detail, assuming a general QAM signal 
constellation. 

Godard considered the problem of combined equalization and carrier phase re- 
covery and tracking. The carrier phase tracking is performed at baseband, following 
the equalizer as shown in Figure 10.5-2. Based on this structure, we may express the 
equalizer output as 

K 

h= E 

n=—K 


Cn ^k—n 


(10.5-23) 
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FIGURE 10.5-2 

Godard scheme for combined adaptive (blind) equalization and carrier phase tracking. 


and the input to the decision device as /„ exp(—/4). where 4 is the carrier phase 
estimate in the kth symbol interval. 

If the desired symbol were known, we could form the error signal 

s k = h ~ I k e (10.5-24) 

and minimize the MSE with respect to 4 and {c„}, i.e., 

min£(|4 - i k e~ ih \ 2 ) (10.5-25) 

<Pk,C 

This criterion leads us to use the LMS algorithm for recursively estimating C and (j) k . 
The LMS algorithm based on knowledge of the transmitted sequence is 

C k+ 1 = C k + A c (4 - he~^) V* k e & (10.5-26) 

4+i = 4 + A 0 Im (l k I* k e^) (10.5-27) 

where A c and A, k are the step-size parameters for the two recursive equations. Note 
that these recursive equations are coupled together. Unfortunately, these equations will 
not converge, in general, when the desired symbol sequence { I k } is unknown. 

The approach proposed by Godard is to use a criterion that depends on the amount 
of intersymbol interference at the output of the equalizer but one that is independent of 
the QAM signal constellation and the carrier phase. For example, a cost function that 
is independent of carrier phase and has the property that its minimum leads to a small 
MSE is 


G (p) = E(\i k \? -\I k \r) 2 (10.5-28) 

where p is a positive and real integer. Minimization of G {, ’ ] with respect to the equalizer 
coefficients results in the equalization of the signal amplitude only. Based on this 
observation, Godard selected a more general cost function, called the dispersion of 
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order p, defined as 

D (p) = E( \I k \ p - R p ) 2 (10.5-29) 

where R p is a positive real constant. As in the case of G (p) , we observe that D <p> is 
independent of the carrier phase. 

Minimization of D (p) with respect to the equalizer coefficients can be performed 
recursively according to the steepest-descent algorithm 

dD {p) 

C k +\ = C k — A p — — — (10.5-30) 

aL k 

where A p is the step-size parameter. By differentiating D (p) and dropping the expecta- 
tion operation, we obtain the following LMS-type algorithm for adjusting the equalizer 
coefficients: 


C k+ 1 = Ck + A p V* k I k \I k \ p ~ 2 {Rp - | I k \ p ) 
where A p is the step-size parameter and the optimum choice of R p is 


R P = 


E{\h\ lp ) 

E(\h\ p ) 


(10.5-31) 


(10.5-32) 


As expected, the recursion in Equation 10.5-31 for C k does not require knowledge 
of the carrier phase. Carrier phase tracking may be carried out in a decision-directed 
mode according to Equation 10.5-27, with I k substituted in place of I k . 

Of particular importance is the case p = 2, which leads to the relatively simple 
algorithm 


C k+ 1 = C, + A p V* k I k (R 2 - \I k \ 2 ) 

<Pk + 1 = 4>k + \m(l k I* k e^ k ) 

where I k is the output decision based on I k , and 

£(|/,l 4 ) 

2 ~ E( | ftp) 


(10.5-33) 


(10.5-34) 


Convergence of the algorithm given in Equation 10.5-33 is demonstrated in the 
paper by Godard (1980). Initially, the equalizer coefficients are set to zero except for 
the center (reference) tap, which is set according to the condition 


kol 2 > 


E\I k \ 4 

2 | x 0 | 2 [£(|/,| 2)] 2 


(10.5-35) 


which is sufficient, but not necessary, for convergence of the algorithm. Simulation 
results performed by Godard on simulated telephone channels with typical frequency- 
response characteristics and transmission rates of 7200-12,000 bits/s indicate that the 
algorithm in Equation 10.5-3 1 performs well and leads to convergence in 5000-20,000 
iterations, depending on the signal constellation. Initially, the eye pattern was closed 
prior to equalization. The number of iterations required for convergence is about an 
order of magnitude greater than the number required to equalize the channels with 
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a known training sequence. No apparent difficulties were encountered in using the 
decision-directed phase estimation algorithm in Equation 10.5-33 from the beginning 
of the equalizer adjustment process. 


10.5-3 Blind Equalization Algorithms Based on Second- and Higher-Order 
Signal Statistics 

It is well known that second-order statistics (autocorrelation) of the received signal 
sequence provide information on the magnitude of the channel characteristics, but not 
on the phase. However, this statement is not correct if the autocorrelation function of 
the received signal is periodic, as is the case for a digitally modulated signal. In such 
a case, it is possible to obtain a measurement of the amplitude and the phase of the 
channel from the received signal. This cyclostationarity property of the received signal 
forms the basis for a channel estimation algorithm devised by Tong et al. (1994, 1995). 

It is also possible to estimate the channel response from the received signal by using 
higher-order statistical methods. In particular, the impulse response of a linear, discrete- 
time-invariant system can be obtained explicitly from cumulants of the received signal, 
provided that the channel input is non-Gaussian. We describe the following simple 
method, due to Giannakis (1987) and Giannakis and Mendel (1989) for estimation 
of the channel impulse response from fourth-order cumulants of the received signal 
sequence. For simplicity, we assume that the received signal sequence is real-valued. 
The fourth-order cumulant is defined as 


(The fourth-order cumulant of a Gaussian signal process is zero.) Consequently, it 
follows that 


For a statistically independent and identically distributed input sequence {/„} to 
the channel, c(h, h+m , h+n > h+i ) = k, a constant, which is called the kurtosis. Then, 
if the length of the channel response is L + 1, we may let m = n = l = —L so that 


c(v k , V k+m , Vk+n, v k+ i) = C r (m, n, l) 


— E{y k V k -\- m V k -\-nVk+C) 

- E(v k v k+m )E(v k+n v k+ i) (10.5-36) 
E(lt£ Vk+n)E(v k+m Vk+l) 

E(Vk v k ) E(v k ^- m v k j-,' 1 ) 


OO 



(10.5-37) 


k=0 


c r (-L, —L, -L) = kf L f* 


(10.5-38) 


Similarly, if we let m = 0, n = L, and / = p, we obtain 


c r ( 0, L, p) = kf L flf p 


(10.5-39) 
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If we combine Equations 10.5-38 and 10.5-39, we obtain the impulse response within 
a scale factor as 


f P = fo 


c r ( 0, L , p) 
c r (—L, —L, —L) ’ 


p=l,2,...,L 


(10.5-40) 


The cumulants c r (m, n. 1) are estimated from sample averages of the received signal 
sequence {u„}. 

Another approach based on higher-order statistics is due to Hatzinakos and Nikias 
(1991). They have introduced the first polyspectra-based adaptive blind equalization 
method named the tricepstrum equalization algorithm (TEA). This method estimates 
the channel response characteristics by using the complex cepstrum of the fourth- 
order cumulants (tricepstrum) of the received signal sequence {u„}. TEA depends 
only on fourth-order cumulants of {i>„} and is capable of separately reconstructing 
the minimum-phase and maximum-phase characteristics of the channel. The channel 
equalizer coefficients are then computed from the measured channel characteristics. 
The basic approach used in TEA is to compute the tricepstrum of the received sequence 
{v„}, which is the inverse (three-dimensional) Fourier transform of the logarithm of the 
trispectrum of ( v „ } . [The trispectrum is the three-dimensional discrete Fourier trans- 
form of the fourth-order cumulant sequence c r (m, n, /).] The equalizer coefficients are 
then computed from the cepstral coefficients. 

By separating the channel estimation from the channel equalization, it is possible 
to use any type of equalizer for the ISI, i.e., either linear, or decision-feedback, or 
maximum-likelihood sequence detection. The major disadvantage with this class of al- 
gorithms is the large amount of data and the inherent computational complexity involved 
in the estimation of the higher-order moments (cumulants) of the received signal. 

In conclusion, we have provided an overview of three classes of blind equalization 
algorithms that find applications in digital communications. Of the three families of 
algorithms described, those based on the maximum-likelihood criterion for jointly 
estimating the channel impulse response and the data sequence are optimal and require 
relatively few received signal samples for performing channel estimation. However, 
the computational complexity of the algorithms is large when the ISI spans many 
symbols. On some channels, such as the mobile radio channel, where the span of the 
ISI is relatively short, these algorithms are simple to implement. However, on telephone 
channels, where the ISI spans many symbols but is usually not too severe, the LMS-type 
(stochastic gradient) algorithms are generally employed. 


■ 10.6 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

Adaptive equalization for digital communications was developed by Lucky (1965, 
1966). His algorithm was based on the peak distortion criterion and led to the zero- 
forcing algorithm. Lucky’s work was a major breakthrough, which led to the rapid 
development of high-speed modems within 5 years of publication of his work. Concur- 
rently, the LMS algorithm was devised by Widrow (1966, 1970), and its use for adaptive 
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equalization for two-dimensional (in-phase and quadrature components) signals was 
described and analyzed in a tutorial paper by Proakis and Miller (1969). 

A tutorial treatment of adaptive equalization algorithms that were developed during 
the period 1965-1975 is given by Proakis (1975). A more recent tutorial treatment of 
adaptive equalization is given in the paper by Qureshi (1985). The major breakthrough 
in adaptive equalization techniques, beginning with the work of Lucky in 1965 coupled 
with the development of trellis-coded modulation, which was described by Ungerboeck 
and Csajka (1976), has led to the development of commercially available high-speed 
modems with a capability of speeds exceeding 30,000 bits/s on telephone channels. 

The use of a more rapidly converging algorithm for adaptive equalization was pro- 
posed by Godard (1974). Our derivation of the RLS (Kalman) algorithm, described 
in Section 10.4-1, follows the approach outlined by Picinbono (1978). RLS lattice 
algorithms for general signal estimation applications were developed by Morf (1977), 
Morf and Lee (1978), and Morf et al. (1977a, b,c). The applications of these algorithms 
have been investigated by several researchers, including Makhoul (1978), Satorius and 
Pack (1981), Satorius and Alexander (1979), and Ling and Proakis (1982, 1984a-c, 
1985, 1986). The fast RLS Kalman algorithm for adaptive equalization was first de- 
scribed by Falconer and Ljung (1978). The above references are just a few of the 
important papers that have been published on RLS algorithms for adaptive equalization 
and other applications. A comprehensive treatment of RLS algorithms is given in the 
books by Hay kin (2002) and Proakis et al. (2002). 

Sato’s (1975) original work on blind equalization was focused on PAM (one- 
dimensional) signal constellations. Subsequently it was generalized to two-dimensional 
and multidimensional signal constellations in the algorithms devised by Godard (1980), 
Benveniste and Goursat (1984), Sato et al. (1986), Foschini (1985), Picchi and Prati 
(1987), and Shalvi and Weinstein (1990). Blind equalization methods based on the use 
of second- and higher-order moments of the received signal were proposed by Giannakis 
(1987), Giannakis and Mendel (1989), Hatzinakos and Nikias (1991), and Tong et al. 
(1994, 1995). The use of the maximum-likelihood criterion for joint channel estimation 
and data detection has been investigated and treated in papers by Sato (1994), Seshadri 
(1994), Ghosh and Weber (1991), Zervas et al. (1991), and Raheli et al. (1995). Finally, 
the convergence characteristics of stochastic gradient blind equalization algorithms 
have been investigated by Ding (1990), Ding et al. (1989), and Johnson (1991). 


PROBLEMS 

10.1 An equivalent discrete-time channel with white Gaussian noise is shown in Figure P10. 1 

a. Suppose we use a linear equalizer to equalize the channel. Determine the tap coeffi- 
cients c_ i , Co, ci of a three-tap equalizer. To simplify the computation, let the AWGN 
be zero. 

b. The tap coefficients of the linear equalizer in (a) are determined recursively via the 
algorithm 


Ck+ i = Ck — AGk. 


Ck = [c_ijfc cok ci*]' 
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where Ga = EC a — £ is the gradient vector and A is the step size. Determine the 
range of values of A to ensure convergence of the recursive algorithm. To simplify 
the computation, let the AWGN be zero. 

c. Determine the tap weights of a DFE with two feedforward taps and one feedback tap. 
To simplify the computation, let the AWGN be zero. 


10.2 Refer to Problem 9.49 and answer the following questions. 

a. Determine the maximum value of A that can be used to ensure that the equalizer 
coefficients converge during operation in the adaptive mode. 

b. What is the variance of the self-noise generated by the three-tap equalizer when 
operating in an adaptive mode, as a function of A? Suppose it is desired to limit 
the variance of the self-noise to 10 percent of the minimum MSE for the three-tap 
equalizer when No = 0. 1 . What value of A would you select? 

c. If the optimum coefficients of the equalizer are computed recursively by the method 
of steepest descent, the recursive equation can be expressed in the form 


where I is the identity matrix. The above represents a set of three coupled first- 
order difference equations. They can be decoupled by a linear transformation that 
diagonalizes the matrix 77 That is, IT = U AU' where A is the diagonal matrix 
having the eigenvalues of E as its diagonal elements and U is the (normalized) modal 
matrix that can be obtained from your answer to Problem 9.49(b). Let C' = U'C and 
determine the steady-state solution for C' . From this, evaluate C = (U r )~ l C' = UC' 
and, thus, show that your answer agrees with the result obtained in Problem 9.49(a). 

10.3 When a periodic pseudorandom sequence of length N is used to adjust the coefficients of 
an A-tap linear equalizer, the computations can be performed efficiently in the frequency 
domain by use of the discrete Fourier transform (DFT). Suppose that {y„ } is a sequence of 
N received samples (taken at the symbol rate) at the equalizer input. Then the computation 
of the equalizer coefficients is performed as follows. 

a. Compute the DFT of one period of the equalizer input sequence { y„ ( , i.e.. 


FIGURE P10.1 



C n+ i = (/ - ADC,, + A$ 


AT— 1 
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b. Compute the desired equalizer spectrum 

X k Y? 

C k = — - — r- , k = 0, 1 

\Y k \ 2 

where {X, } is the precomputed DFT of the training sequence. 

c. Compute the inverse DFT of {Q} to obtain the equalizer coefficients {c,,}. Show that 
this procedure in the absence of noise yields an equalizer whose frequency response 
is equal to the frequency response of the inverse folded channel spectrum at the N 
uniformly spaced frequencies f k = k/NT, k = 0, 1, . . . , N — 1. 

10.4 Show that the gradient vector in the minimization of the MSE may be expressed as 

G k = -E(e k V* k ) 

where the error s k = h ~ h, and the estimate of G k , i.e., 

Gk = ~ e k V l 

satisfies the condition that E(G k ) = G k - 


10.5 The tap-leakage LMS algorithm proposed in the paper by Gitlin et al. (1982) may be 
expressed as 

C N (n + 1) = wC N (n) + As(n)V* N (n ) 

where 0 < w < 1, A is the step size, and V ^(n) is the data vector at time n. Determine 
the condition for the convergence of the mean value of Cn{n). 


10.6 Consider the random process 

x{n) = gv(n) + w(n), n — 0, 1, . . . , M — 1 

where v(n) is a known sequence, g is a random variable with E(g ) = 0, and E(g 2 ) = G. 
The process w(n ) is a white noise sequence with 

yvvvv(' )? ) = 

Determine the coefficients of the linear estimator for g, that is, 

M - 1 

g = ^2 h(n)x(n) 

n= 0 

that minimize the mean square error. 


10.7 A digital transversal filter can be realized in the frequency-sampling form with system 
function (see Problem 9.56) 


H(z) = 


1 z~ M 

M 


M - 1 


E 


in 

1 _ e j2nk/M z -\ 


= ffi(z)fl2(z) 


where H\(z) is the comb filter, Eliiz) is the parallel bank of resonators, and {H k } are the 
values of the discrete Fourier transform (DFT). 

a. Suppose that this structure is implemented as an adaptive filter using the LMS algo- 
rithm to adjust the filter (DFT) parameters {Hk}. Give the time-update equation for 
these parameters. Sketch the adaptive filter structure. 


Chapter Ten: Adaptive Equalization 


735 


b. Suppose that this structure is used as an adaptive channel equalizer in which the desired 
signal is 




Ink 


With this form for the desired signal, what advantages are there in the LMS adaptive 
algorithm for the DFT coefficients { Hi } over the direct-form structure with coefficients 
[h{n)}l [See Proakis (1970).] 

10.8 Consider the performance index 

J = h 2 + 40 h + 28 

Suppose that we search for the minimum of J by using the steepest-descent algorithm 


where g(n) is the gradient. 

a. Determine the range of values of A that provides an overdamped system for the 
adjustment process. 

b. Plot the expression for / as a function of n for a value of A in this range. 

10.9 Determine the coefficients ci\ and ai for the linear predictor shown in Figure P10.9, given 
that the autocorrelation y xx (m) of the input signal is 

y xx (m) = b^ m \ 0 < b < 1 


10.10 Determine the lattice filter and its optimum reflection coefficients corresponding to the 
linear predictor in Problem 10.9. 

10.11 Consider the adaptive FIR filter shown in Figure P 1 0. 1 1 . The system C(z) is characterized 
by the system function 


Determine the optimum coefficients of the adaptive transversal (FIR) filter B(z) = bo + 
b\z~ l that minimize the mean square error. The additive noise is white with variance 


h(n + 1) = h(ri) — ^A g(n) 



FIGURE P10.9 


a 2 , = 0 . 1 . 
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FIGURE P10.ll 

10.12 An N x N correlation matrix F has eigenvalues A.i > 7.2 > ■ ■ • > k N > 0 and associated 
eigenvectors V\, v 2 , ■■■ , v N . Such a matrix can be represented as 

r = y ^XjVjV? 

1=1 

a. If r = F 1//2 /"^ 2 , where JT 1 ^ 2 is the square root ofF, show that can be represented 


as 


r'* = 


b. Using this representation, determine a procedure for computing F l/2 . 



Multichannel and Multicarrier Systems 


In some applications, it is desirable to transmit the same information-bearing signal 
over several channels. This mode of transmission is used primarily in situations where 
there is a high probability that one or more of the channels will be unreliable from 
time to time. For example, radio channels such as ionospheric scatter and tropospheric 
scatter suffer from signal fading due to multipath, which renders the channels unreliable 
for short periods of time. As another example, multichannel signaling is sometimes 
employed in wireless communication systems as a means of overcoming the effects 
of interference of the transmitted signal. By transmitting the same information over 
multiple channels, we are providing signal diversity, which the receiver can exploit to 
recover the information. 

Another form of multichannel communications is multiple carrier transmission, 
where the frequency band of the channel is subdivided into a number of subchannels 
and information is transmitted on each of the subchannels. A rationale for subdividing 
the frequency band of a channel into a number of narrowband channels is given below. 

In this chapter, we consider both multichannel signal transmission and multicarrier 
transmission. The focus is on the performance of such systems in AWGN channels. 
The performance of multichannel and multicarrier transmission in fading channels is 
treated in Chapter 13. We begin with a treatment of multichannel transmission. 


■ 11.1 

MULTICHANNEL DIGITAL COMMUNICATIONS IN AWGN CHANNELS 

In this section, we confine our attention to multichannel signaling over fixed channels 
that differ only in attenuation and phase shift. The specific model for the multichannel 
digital signaling system is illustrated in Figure 11.1-1 and may be described as follows. 
The signal waveforms, in general, are expressed as 

s£>(t) = Re [sfckOeW'*] , 0 <t<T 

n = 1, 2, . . . , L, m = 1,2, ..., M (11.1-1) 
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Output decision 


FIGURE 11.1-1 

Model of a multichannel digital communication system. 


where L is the number of channels and M is the number of waveforms. The waveforms 
are assumed to have equal energy and to be equally probable a priori. The waveforms 
transmitted over the L channels are scaled by the attenuation factors {a,,}, 
phase-shifted by {</>„}, and corrupted by additive noise. The equivalent low-pass signals 
received from the L channels may be expressed as 

r\ n \t) = a„e j4 ’ ,, s^(t) + z n (t), 0 <t <T 

n = 1,2, . . . , L, m = 1,2, M (11.1-2) 

where {s\ n J{t)\ are the equivalent lowpass transmitted waveforms and represent 

the additive noise processes on the L channels. We assume that (z„(f) } are mutually 
statistically independent and identically distributed Gaussian noise random processes. 

We consider two types of processing at the receiver, namely, coherent detection 
and noncoherent detection. The receiver for coherent detection estimates the channel 
parameters {ck,,} and {</>„} and uses the estimates in computing the decision variables. 
Suppose we define g n = a„e^ n and let g„ be the estimate of g n . The multichannel 
receiver correlates each of the L received signals with a replica of the corresponding 
transmitted signals, multiplies each of the correlator outputs by the corresponding 
estimates { g* } , and sums the resulting signals. Thus, the decision variables for coherent 
detection are the correlation metrics 


CM,,, 


J2 Re 


Sn I r\ n \t)s\ n 2\t)dt 


m = 1,2, M (11.1-3) 


In noncoherent detection, no attempt is made to estimate the channel parameters. 
The demodulator may base its decision either on the sum of the envelopes (envelope 
detection) or the sum of the squared envelopes (square-law detection) of the matched 
filter outputs. In general, the performance obtained with envelope detection differs little 
from the performance obtained with square-law detection in AWGN. However, square- 
law detection of multichannel signaling in AWGN channels is considerably easier 
to analyze than envelope detection. Therefore, we confine our attention to square- 
law detection of the received signals of the L channels, which produces the decision 
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variables 


CM m 



» 


( t)s 


(«)* 

/m 


(0 dt 


2 


m = 1 , 2, . . . , M 


(11.1-4) 


Let us consider binary signaling first, and assume that n = 1 , 2, .... L, arc the 

L transmitted waveforms. Then an error is committed if C Mi > CM\, or, equivalently, 
if the difference D = CM\ — C Mi < 0. For noncoherent detection, this difference 
may be expressed as 


L 

D = Y J {\Xn\ 2 -\Yn\ 2 ) (11-1-5) 

n= 1 


where the variables j X n } and { Y n } are defined as 

r T 

X n = r i(t)sii(t) dt, n = 1,2, ... ,L 
Jo 

rT 

Y n = / r\ n) (t)s { £*(t) dt, n = 1,2, ... ,L 
Jo 


( 11 . 1 - 6 ) 


The {X,,} are mutually independent and identically distributed complex Gaussian ran- 
dom variables. The same statement applies to the variables { Y „ } . However, for any n, 
X„ and Y„ may be correlated. For coherent detection, the difference D = CM\— C Mi 
may be expressed as 

L 

D =\H ( X nY: + KYn) 

n = 1 

where, by definition, 

Y n = g n , n = 1, 2, . . . , L 

Xn = [ T r\ n \t) - 4 n) *(0] dt 

Jo 


(11.1-7) 


( 11 . 1 - 8 ) 


If the estimates {g,,} are obtained from observation of the received signal over one or 
more signaling intervals, as described in Appendix C, their statistical characteristics 
are described by the Gaussian distribution. Then the {y„} are characterized as mutually 
independent and identically distributed Gaussian random variables. The same statement 
applies to the variables {X„}. As in noncoherent detection, we allow for correlation 
between X„ and Y„, but not between X m and Y„ for m ^ n. 


11.1-1 Binary Signals 

In Appendix B, we derive the probability that the general quadratic form 

L 

D = Y J {MX n \ 1 + B\Y n \ 2 + CX„Y : + C*X;,Y n ) 

n = 1 


(11.1-9) 
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in complex- valued Gaussian random variables is less than zero, where A and B are 
real constants and C may be either a real or a complex- valued constant. This proba- 
bility, which is given in Equation B-21 of Appendix B, is the probability of error for 
binary multichannel signaling in AWGN. A number of special cases are of particular 
importance. 

If the binary signals are antipodal and the estimates of [g n } are perfect, as in 
coherent PSK, the probability of error takes the simple form 

P b =Q(VWb) (11.1-10) 


where 


L c> Li 


(ii.i-ii) 


is the SNR per bit. If the channels are all identical, a n = a. for all n and, hence, 

L£ , 

Yb = —a 2 (H.l- 12 ) 

A o 

We observe that L£ is the total transmitted signal energy for the L signals. The inter- 
pretation of this result is that the receiver combines the energy from the L channels 
in an optimum manner. That is, there is no loss in performance in dividing the total 
transmitted signal energy among the L channels. The same performance is obtained as 
in the case in which a single waveform having energy L£ is transmitted on one channel. 
This behavior holds true only if the estimates g„ = g„, for all n. If the estimates are 
not perfect, a loss in performance occurs, the amount of which depends on the quality 
of the estimates, as described in Appendix C. 

Perfect estimates for { g n } constitute an extreme case. At the other extreme, we 
have binary DPSK signaling. In DPSK, the estimates {§„} are simply the (normalized) 
signal-plus-noise samples at the outputs of the matched filters in the previous signaling 
interval. This is the simplest estimate that one might consider using in estimating {#„}. 
For binary DPSK, the probability of error obtained from Equation B-21 is 


where, by definition, 


Pb = 


2 2l ~ 


-n 


L - 1 
n = 0 


L—l—n 


= - y 


k = 0 


/ 2L - 1 

V k 


(11.1-13) 


(11.1-14) 


and Yb is the SNR per bit defined in Equation 1 1.1-1 1 and, for identical channels, 
in Equation 11.1-12. This result can be compared with the single-channel ( L = 1) 
error probability. To simplify the comparison, we assume that the L channels have 
identical attenuation factors. Thus, for the same value of yb , the performance of the 
multichannel system is poorer than that of the single-channel system. That is, splitting 
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the total transmitted energy among L channels results in a loss in performance, the 
amount of which depends on L. 

A loss in performance also occurs in square-law detection of orthogonal sig- 
nals transmitted over L channels. For binary orthogonal signaling, the expression for 
the probability of error is identical in form to that for binary DPSK given in Equa- 
tion 11.1-13, except that yb is replaced by \ yi,. That is, binary orthogonal signaling 
with noncoherent detection is 3 dB poorer than binary DPSK. However, the loss in 
performance due to noncoherent combination of the signals received on the L channels 
is identical to that for binary DPSK. 

Figure 11.1-2 illustrates the loss resulting from noncoherent (square-law) combin- 
ing of the L signals as a function of L. The probability of error is not shown, but it can 
be easily obtained from the curve of the expression 

P h = - 2 e-»’ (11.1-15) 

which is the error probability of binary DPSK shown in Figure 4.5-5 and then degrad- 
ing the required SNR per bit, yb, by the noncoherent combining loss corresponding to 
the value of L. 


11.1-2 M - ary Orthogonal Signals 

Now let us consider M - ary orthogonal signaling with square-law detection and com- 
bination of the signals on the L channels. The decision variables are given by Equa- 
tion 11.1-4. Suppose that the signals n = 1,2 , . . L, are transmitted over the 
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Combining loss in noncoherent detection and combination of binary multichannel signals. 
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L AWGN channels. Then, the decision variables are expressed as 

L 

CM \ =£/!=£ |2£a„ + N nl \ 2 

n=\ 

L 

CM,,, = U m = ^2 I Nnm [ 2 , m = 2, 3 M 

n = 1 


(11.1-16) 


where the {N,„„} are circular complex- valued zero-mean Gaussian random variables 
with variance a 2 = 2£ Nq per real and imaginary component. Hence U \ is described 
statistically as a noncentral chi-square random variable with 2 L degrees of freedom 
and noncentrality parameter 


= J2(2£a„) 2 = 4 £ 2 J2 




n=l 


n = 1 


Using Equation 2.3-29, we obtain the PDF of U\ as 


Pi mi) = 


1 / Ml 

\£ N 0 U 2 


(£.— D/2 


exp 


s 2 + U\ 

' 4£N 0 


h- 


f S y£u~l \ 
\2 £No) 


(11.1-17) 


, u i>0 (11.1-18) 


On the other hand, the {(/,„}, m = 2, 3, . . . , M, are statistically independent and iden- 
tically chi-square-distributed random variables, each having 2 L degrees of freedom. 
Using Equation 2.3-21, we obtain the PDF for U,„ as 


P(u m ) = 


1 


(4£N 0 ) l (L - 1)! 


U l- i e — Um ' 4£N °, 


u m > 0 

m = 2, 3 M (11.1-19) 


The probability of a symbol error is 


P e =\- P c 

= 1 - P(U 2 <U u U 3 <Uu...,U M < t/i) 

POO 

= 1 - / [P(t/ 2 < u 1 1 Ui = m)] M_1 />(Mi)diii 

Jo 


(11.1-20) 


But 

P{U 2 < u\\U\ = u\) = \ — exp ( — 
Hence, 

Pe= 1 - / 


Mi 


L-l , , 

E l | w i 

T\ 


4£ N oJ ^ it! V4U7 Vo 


(11.1-21) 


70 


1 _ e -«i/4fVo 1 f Ml V 


M-l 


p(«i) Jmi 


= 1 - 


‘-“Sir 


k = 0 

/„\(i-l)/2 


*=0 


^ e-^ +v) I L ^(2^)dv 


(11.1-22) 
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where 


y = £ Y, 


No 


The integral in Equation 1 1.1-22 can be evaluated numerically. It is also possible 
to expand the term (1 — x) M 1 in Equation 1 1.1-22 and carry out the integration term 
by term. This approach yields an expression for P e in terms of finite sums. 

An alternative approach is to use the union bound 


P e <(M- l)P 2 (L) 


(11.1-23) 


where PiiL) is the probability of error in choosing between U \ and any one of the 
M — 1 decision variables {!/,„}, m = 2, 3, . . . , M. From our previous discussion on 
the performance of binary orthogonal signaling, we have 

1 

P i{L) = ^e-^ /2 5> n (±fcy 6 ) n (11.1-24) 

where c n is given by Equation 11.1-14. For relatively small values of M . the union 
bound in Equation 1 1.1-23 is sufficiently tight for most practical applications. 


■ 11.2 

MULTICARRIER COMMUNICATIONS 

From our treatment of nonideal linear biter channels in Chapters 9 and 10, we have 
observed that such channels introduce ISI, which degrades performance compared with 
the ideal channel. The degree of performance degradation depends on the frequency- 
response characteristics. Furthermore, the complexity of the receiver increases as the 
span of the ISI increases. 

In this section, we consider the transmission of information on multiple carriers 
contained within the allocated channel bandwidth. The primary motivation for transmit- 
ting the data on multiple carriers is to reduce ISI and, thus, eliminate the performance 
degradation that is incurred in single carrier modulation. 


11.2-1 Single-Carrier Versus Multicarrier Modulation 

Given a particular channel characteristic, the communication system designer must 
decide how to efficiently utilize the available channel bandwidth in order to transmit 
the information reliably within the transmitter power constraint and receiver complexity 
constraints. For a nonideal linear biter channel, one option is to employ a single-carrier 
system in which the information sequence is transmitted serially at some specibed rate 
R symbols/s. In such a channel, the time dispersion is generally much greater than 
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the reciprocal of the symbol rate, and, hence, ISI results from the nonideal frequency- 
response characteristics of the channel. As we have observed, an equalizer is necessary 
to compensate for the channel distortion. 

As an example of such an approach, we cite the modems designed to transmit data 
through voice-band channels in the switched telephone network, which are based on the 
International Telecommunications Union (ITU) standard V.34. Such modems employ 
QAM impressed on a single carrier that is selected along with the symbol rate from a 
small set of specified values to obtain the maximum throughout at the desired level of 
performance (error rate). The channel frequency-response characteristics are measured 
upon initial setup of the telephone circuit, and the symbol rate and earner frequency 
are selected based on this measurement. 

An alternative approach to the design of a bandwidth-efficient communication sys- 
tem in the presence of channel distortion is to subdivide the available channel bandwidth 
into a number of subchannels, such that each subchannel is nearly ideal. To elaborate, 
suppose that C(/) is the frequency response of a nonideal, band-limited channel with 
a bandwidth W, and that the power spectral density of the additive Gaussian noise is 
S„„(f). Then we divide the bandwidth W into N = W / Af subbands of width A/, 
where Af is chosen sufficiently small that |C(/)| 2 /5„„(/) is approximately a con- 
stant within each subband. Furthermore, we select the transmitted signal power to be 
distributed in frequency as P(f), subject to the constraint that 

f P(f)df<P av (11-2-1) 

Jw 

where P av is the available average power of the transmitter. Then we transmit the data 
on these N subchannels. Before proceeding further with this approach, we evaluate the 
capacity of the nonideal additive Gaussian noise channel. 


11.2-2 Capacity of a Nonideal Linear Filter Channel 


Recall that the capacity of an ideal, band-limited, AWGN channel is 

c=wv *( i +md (ii - 2 - 2) 

where C is the capacity in bits/s, W is the channel bandwidth, and P av is the average 
transmitted power. In a multicarrier system, with Af sufficiently small the subchannel 
has capacity 


Q = Af log. 


x A/ P(fi)\C(fi)\ 2 

A fS nn {f) 


(11.2-3) 


Hence, the total capacity of the channel is 


N 


C = Y,C t = A/]Tlog 2 


i = 1 


i = 1 


S„ n (fi) 


(11.2-4) 
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In the limit as A / — » 0,we obtain the capacity of the overall channel in bits/s as 


C = 



\ P(f)\C(f)\ 2 

S nn (fi ) 


df 


(11.2-5) 


Under the constraint on P(f) given by Equation 11.2-1, the choice of P(f) that 
maximizes C may be determined by maximizing the integral 



, . P(f)\C(f )\ 2 

Snnif) 


+ XP(f) df 


( 11 . 2 - 6 ) 


where k is a Lagrange multiplier, which is chosen to satisfy the constraint. By us- 
ing the calculus of variations to perform the maximization, we find that the optimum 
distribution of transmitted signal power is obtained from the solution to the equation 


1 

P(f) + S nn {f)/\C{f)\ 2 +X = ° 


(11.2-7) 


Therefore, P(f) + S nn (f)\C(f)\ 2 must be a constant, whose value is adjusted to satisfy 
the average power constraint in Equation 1 1 .2-1 . That is, 


P(f) = 


K — S nn {f)/\C{f)\ 2 
0 


feW 

fiw 


( 11 . 2 - 8 ) 


This expression for the channel capacity of a nonideal linear filter channel with additive 
Gaussian noise is due to Holsinger (1964). The basic interpretation of this result is that 
the signal power should be high when the channel SNR \C(f)\ 2 /S„ n (f) is high, and 
low when the channel SNR is low. This result on the transmitted power distribution 
is illustrated in Figure 11.2-1. Observe that if 5„„(/)/|C(/)| 2 is interpreted as the 
bottom of a bowl of unit depth, and we pour an amount of water equal to P av into 
the bowl, the water will distribute itself in the bowl so as to achieve capacity. This is 
called the water-filling interpretation of the optimum power distribution as a function 
of frequency. 

It is interesting to note that the channel capacity is smallest when the channel 
SNR \C(f)\ 2 /S nn (f) is a constant for all / e W. In this case, P{f) is a constant for 
all f e W. Equivalently, if the channel frequency response is ideal, i.e., C(/) = 1 
for f e W, then the worst Gaussian noise power distribution, from the viewpoint of 
maximizing capacity, is white Gaussian noise. 



FIGURE 11.2-1 

The optimum power distribution based on water-filling 
interpretation. 
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11.2-3 Orthogonal Frequency Division Multiplexing (OFDM) 

The above development suggests that multicarrier modulation that divides the available 
channel bandwidth into subbands of relatively narrow width A / = W/N provides a 
solution that could yield transmission rates close to channel capacity. The signal in 
each subband may be independently coded and modulated at a synchronous symbol 
rate of 1 /A/. If A / is small enough, the channel frequency response C(/) is essentially 
constant across each subband. Hence, the intersymbol interference is negligible. Such 
a subdivision of the channel bandwidth W is illustrated in Figure 1 1.2-2. 

With each subband (or subchannel), we associate a sinusoidal carrier signal of the 
form 


where /*. is the mid frequency in the kth subchannel. By selecting the symbol rate 1 / T 
in each of the subchannels to be equal to the frequency separation A/ of the adjacent 
subcarriers, the subcarriers are orthogonal over the symbol interval T, independent of 
the relative phase relationship between subcarriers. That is, 


where fk — fj = n/T, n = 1, 2, . . . , N — 1, independent of the values of the phases 
c pk and (f>j . Thus, we construct orthogonal frequency-division multiplexed (OFDM) 
signals. In other words, OFDM is a special type of multicarrier modulation in which 
the subcarriers of the corresponding subchannels are mutually orthogonal, as defined 
in Equation 1 1.2-10. 

Multicarrier modulation (OFDM) is widely used in both wireline and radio chan- 
nels. For example, OFDM has been adopted as a standard for digital audio broadcast 
applications and wireless local area networks based on the IEEE 802. 1 1 standard. 

A particular suitable application of OFDM is in digital transmission over copper 
wire subscriber loops. The typical channel attenuation characteristics for such sub- 
scriber lines are illustrated in Figure 1 1.2-3. We observe that the attenuation increases 
rapidly as a function of frequency. This characteristic makes it extremely difficult to 


s k (t) = cos 2nfkt, k = 0, 1 


(11.2-9) 



( 11 . 2 - 10 ) 



FIGURE 11.2-2 

Subdivision of the channel bandwidth 
W into narrowband subchannels of 
equal width A/. 
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FIGURE 11.2-3 


Attenuation characteristic of a 24-gauge 12,000-ft 
polyethylene-insulated cable loop. [From Werner (1991) 
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achieve a high transmission rate with a single modulated carrier and an equalizer at the 
receiver. The ISI penalty in performance is very large. On the other hand, OFDM with 
optimum power distribution provides the potential for a higher transmission rate. 

The dominant noise in transmission over subscriber lines is crosstalk interference 
from signals carried on other telephone lines located in the same cable. The power 
distribution of this type of noise is also frequency-dependent, which can be taken into 
consideration in the allocation of the available transmitted power. 

A design procedure for a multicarrier QAM system for a nonideal linear filter chan- 
nel has been given by Kalet (1989). In this procedure, the overall bit rate is maximized, 
through the design of an optimal power division among the subcarriers and an optimum 
selection of the number of bits per symbol (sizes of the QAM signal constellations) for 
each subcarrier, under an average power constraint and under the constraint that the 
symbol error probabilities for all subcarriers are equal. 

11.2-4 Modulation and Demodulation in an OFDM System 

In an OFDM system with N subchannels, the symbol rate 1 / T is reduced by a factor 
of N relative to the symbol rate on a single carrier system that employs the entire 
bandwidth W and transmits data at the same rate as OFDM. Hence, the symbol interval 
in the OFDM system is T = NT S , where T s is the symbol interval in the single- 
carrier system. By selecting N to be sufficiently large, the symbol interval T can 
be made significantly larger than the time duration of the channel-time dispersion. 
Thus, intersymbol interference can be made arbitrarily small through the selection 
of N. In other words, each subchannel appears to have a fixed frequency response 
C(f k ), k = 

Suppose that each subcarrier is modulated with M - ary QAM. Then the signal on 
the kth subcarrier may be expressed as 



= Re 



( 11 . 2 - 11 ) 


= Re 
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where X k = A k e* 6k is the signal point from the QAM signal constellation that is 
transmitted on the ft It subcarrier, A k = A j~. + A ^ , and 9 k = tan ~ l (A kq /A ki ). The 
energy per symbol £ s has been absorbed into { X k 1 . 

When the number of subchannels is large, so that the subchannels are sufficiently 
natTowband, each subchannel can be characterized by a fixed frequency response 
C(f k ), k = 0,1 ,N- 1. In general, C(f k ) is complex-valued and may be ex- 

pressed as 

C(f k ) = C k = \C k \e j<t>k (11.2-12) 

Hence, the received signal on the k th subchannel is 

FT FT 

r k (t ) = Y —\C k \A kc cos(27 xf k t + fa) + d —\C k \A ks sm(2jr f k t + fa) + n k (t) 


= Re 



C k X k e™ kt 


+ n k (t ) 


(11.2-13) 


where n k (t) represents the additive noise in the kth subchannel. We assume that n k (t) 
is zero-mean Gaussian and spectrally flat across the bandwidth of the fth subchannel. 
We also assume that the channel parameters \C k \ and fa are known at the receiver. 
(These parameters are usually estimated by initially transmitting the unmodulated car- 
rier cos 2itf k t and observing the received signal \C k \ cos (2nf k t + <p k ).) 

The demodulation of the received signal in the kth subchannel may be accomplished 
by cross-correlating r k (t) with the two basis functions, based on knowledge of the carrier 
phase [fa] at the receiver, 


'Ai(0 = \l — cos(27 xf k t + fa). 


fa (0 = -\l — sin(2 nf k t + fa), 


0 <t <T 


0 <t <T 


(11.2-14) 


and sampling the output of the cross-correlators at t = T . Thus, we obtain the received 
signal vector 


yk — (\C k \A ki + )/£,., \C k \A kq + r] ki ) (11.2-15) 

which can also be expressed as the complex number 

Y k = \C k \X k + m (11.2-16) 

where r] k = rj kr + j r) k i represents the additive noise. 

The scaling of the transmitted symbol by the channel gain | C k \ can be removed by 
dividing Y k by \C k \. Thus, we obtain 

Yk = Yk/\C k \ =X k + ri k 


(11.2-17) 
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where r] k = rji- / 1 C* | . The normalized variable Y k is passed to the detector, which 
computes the distance metrics between Yj, and each of the possible signal points in the 
QAM signal constellation and selects the signal point resulting in the smallest distance. 

From this description, it is clear that two cross-correlators or two matched biters 
are required to demodulate the received signal in each subchannel. Therefore, if the 
OFDM signal consists of N subchannels, the implementation of the OFDM demodula- 
tor requires a parallel bank of 2 N cross-correlators or 2 N matched biters. Furthermore, 
the modulation process for generating the OFDM signal can also be viewed as ex- 
citing a bank of 2 N parallel biters with symbols taken from an M- ary QAM signal 
constellation. 

The bank of 2 N parallel biters that generates the modulated signal at the transmitter 
and demodulates the received signal is equivalent to the computation of the discrete 
Fourier transform (DFT) and its inverse. Since an efficient computation of the DFT 
is the fast Fourier transform (FFT) algorithm, a more efficient implementation of the 
modulation and demodulation processes when N is large, e.g., N > 32, is by means of 
the FFT algorithm. In the next section, we describe the implementation of the modulator 
and demodulator in an OFDM system that uses the FFT algorithm to compute the DFT. 

Since the signals transmitted on the N subchannels of the OFDM system are 
synchronized, the received signals on any pair of subchannels are orthogonal over the 
interval 0 < t < T . If the subchannel gains C/ £ | , 0 < k < N — 1, are sufficiently 
different across the channel bandwidth, subchannels that yield a higher SNR due to a 
lower attenuation can be modulated to carry more bits per symbol than subcarriers that 
yield a lower SNR (high attenuation). Consequently, QAM with different constellation 
sizes can be used on the different subchannels of an OFDM system. This assignment 
of different constellation sizes to different subchannels is generally done in practice. 


11.2-5 An FFT Algorithm Implementation of an OFDM System 

In this section we describe a multicarrier communication system that employs the 
fast Fourier transform algorithm to synthesize the signal at the transmitter and to demod- 
ulate the received signal at the receiver. The FFT is simply the efficient computational 
tool for implementing the DFT. 

Figure 1 1 .2^4 illustrates a block diagram of a multicarrier communication system. 
A serial-to-parallel buffer segments the information sequence into frames of Nf bits. 
The Nf bits in each frame are parsed into N groups, where the ith group is assigned b, 
bits, and 

N 

Y b > = Nf (11.2-18) 

;=i 

Each group may be encoded separately, so that the number of output bits from the 
encoder for the zth group is m > bi. 

It is convenient to view the multicarrier modulation as consisting of N independent 
QAM channels, each operating at the same symbol rate 1/ T, but each channel having 
a distinct QAM constellation; i.e., the zth channel will employ M = 2 bi signal points. 
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FIGURE 11.2-4 

Multicarrier communication system. 


We denote the complex-valued signal points corresponding to the information symbols 
on the subchannels by X k , k = 0, I ..... /V — I . To modulate the N subcarriers by the 
information symbols { X ^ } , we employ the inverse DFT (IDFT). 

However, if we compute the N -point IDFT of {X^}, we obtain a complex- valued 
time series, which is not equivalent to N QAM-modulated subcarriers. Instead, we 
create N = 2 N information symbols by defining 

X N - k = X* k , k = l,...,N-l (11.2-19) 

and Xq = Re(Xo), X# = Im(Xo). Thus, the symbol Xq is split into two parts, both 
real. Then the iV-point IDFT yields the real-valued sequence 
l n-i 

x„ = —= Y X k e j2nnk/N , n = 0, 1, . . . , N - 1 (1 1.2-20) 


where 1 /y/~N is simply a scale factor. 

The sequence {.r„, 0 < n < N — 1} corresponds to the samples of the sum x(t) of 
N subcarrier signals, which is expressed as 

l N ~ ] 

x(t) = —= x k e ]l7lkt/T , 0 <t <T (1 1.2-21) 

v ™ k = o 

where T is the symbol duration. We observe that the subcarrier frequencies are 
f k = k/T, k = 0, 1, . . . , N. Furthermore, the discrete-time sequence {x n } in Equa- 
tion 1 1.2-20 represents the samples of x(t) taken at times t = nT /N where n = 0, 
1 - 1 . 

The computation of the IDFT of the data {X^.} as given in Equation 1 1.2-20 may 
be viewed as multiplication of each data point X k by a corresponding vector 


where 


Vk = [V k0 V k \ ... U^(JV-l)] 


( 11 . 2 - 22 ) 


V k n 


1 j(2n/N)kn 

s/N 


( 11 . 2 - 23 ) 


Chapter Eleven: Multichannel and Multicarrier Systems 


751 



FIGURE 11.2-5 

Signal synthesis for multicarrier modulation 
based on inverse DFT. 


as illustrated in Figure 1 1.2-5. In any case, the computation of the DFT is performed 
efficiently by the use of the FFT algorithm. 

In practice, the signal samples {x„} are passed through a digital-to- analog (D/A) 
converter whose output, ideally, would be the signal waveform x(t). The output of the 
channel is the waveform 


r{t) = x(t) * c(t) + n{t) (11.2-24) 

where c(t) is the impulse response of the channel and * denotes convolution. By se- 
lecting the bandwidth A/ of each subchannel to be very small, the symbol duration 
T = 1/A/ is large compared with the channel time dispersion. To be specific, let us 
assume that the channel dispersion spans v + 1 signal samples where v « iV. One 
way to avoid the effect of IS I is to insert a time guard band of duration vT/N between 
transmissions of successive blocks. 

An alternative method that avoids ISI is to append a cyclic prefix to each block 
of N signal samples {xo, x\, . . . , x,v- i }. The cyclic prefix for this block of samples 
consists of the samples xn-v , xat-v+i, . . . , x;v-i- These new samples are appended to 
the beginning of each block. Note that the addition of the cyclic prefix to the block 
of data increases the length of the block to N + v samples, which may be indexed 
from n = — v, . . . , N — 1, where the first v samples constitute the prefix. Then if 
j c n , 0 < n < v] denotes the sampled channel impulse response, its convolution with 
{x„ , — v < n < N — 1} produces {r„}, the received sequence. We are interested in the 
samples of {r„ } for 0 < n < N — 1, from which we recover the transmitted sequence by 
using the N -point DFT for demodulation. Thus, the first v samples of j /•„ } are discarded. 

From a frequency-domain viewpoint, when the channel impulse response is {c„, 0 < 
n < v}, its frequency response at the subcarrier frequencies fk = k/N is 

C k = C ^ c n e - j2nnklN ( 1 1 .2-25) 

Because the cyclic prefix serves as a time guard band against interference, successive 
blocks (frames) of the transmitted information sequence do not interfere and, hence, 
the demodulated sequence may be expressed as 


%k = C k X k + r] k , 


k = 0, 1, ... , N — 1 


(11.2-26) 
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where {X k } is the output of the N -point DFT demodulator and r] k is the additive noise 
corrupting the signal. We note that by selecting N f>> v, the rate loss due to the cyclic 
prefix can be rendered negligible. 

As shown in Figure 11.2-4, the information is demodulated by computing the 
DFT of the received signal after it has been passed through an analog-to-digital (A/D) 
converter. The DFT computation may be viewed as a multiplication of the received 
signal samples {r„} from the A/D converter by v* k , where v k is defined in Equation 
11.2-22. As in the case of the modulator, the DFT computation at the demodulator is 
performed efficiently by use of the FFT algorithm. 

It is simple matter to estimate and compensate for the channel factors {C k } prior 
to passing the data to the detector and decoder. A training signal consisting of either a 
known modulated sequence on each of the subcarriers or unmodulated subcarriers may 
be used to measure the { C k } at the receiver. If the channel parameters vary slowly with 
time, it is also possible to track the time variations by using the decisions at the output 
of the detector or the decoder, in a decision-directed fashion. Thus, the multicarrier 
system can be rendered adaptive. 

By measuring the SNR in each subchannel, one can optimize the transmission rate 
by allocating the average transmitted power and the number of bits to be carried by 
each subcarrier. The SNR per subchannel is defined as 

TP k \C k \ 2 

SNR A . = — -XXX- (11.2-27) 

a nk 

where T is the symbol duration, P k is the average power allocated to the kth subchannel, 

| C k | 2 is the magnitude squared of the frequency response of the A t h subchannel, and a 2 k 
is the variance of the noise in the /ft It subchannel. Based on these SNR measurements, 
the capacity of each subchannel may be determined as described in Section 11.2-2. 
Furthermore, system performance may be optimized by selecting the bit and power 
allocation for each subchannel as described below and in the papers by Chow et al. 
(1995) and Fischer and Huber (1996). 

Multicarrier QAM of the type described above has been implemented for a variety 
of applications, including high-speed transmission over telephone lines, such as digital 
subscriber lines. 

Other types of implementation besides the DFT are possible. For example, a dig- 
ital biter bank that basically performs the DFT may be substituted for the FFT-based 
implementation when the number of subcarriers is small, e.g., N < 32. For a large 
number of subcarriers, e.g., N > 32, the FFT-based systems are computationally more 
efficient. 


11.2-6 Spectral Characteristics of Multicarrier Signals 

Although the signals transmitted on the subcarriers of an OFDM system are mutually 
orthogonal in the time domain, these signals have signibcant overlap in the frequency 
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FIGURE 11.2-6 

An example of the magnitude of the frequency response of adjacent subchannel filters in 
OFDM system for /e (0, 0.06^) and N = 64. [From Cherubini et al. (2002) IEEE.] 


domain. This can be observed by computing the Fourier transform of the signal 


u k (t) = Re 




A k cos(2nf k t + 9 k ), 


0 <t <T 


(11.2-28) 


for several values of k. Figure 11.2-6 illustrates the magnitude spectrum \U k (f)\ for 
several adjacent subcarriers. Note the large spectral overlap of the main lobes. Also 
note that the first sidelobe in the spectrum is only 13 dB down from the main lobe. 
Flence, there is a significant amount of spectral overlap among the signals transmitted 
on different subcarriers. Nevertheless, these signals are orthogonal when transmitted 
synchronously in time. 

The large spectral overlap of the OFDM signals has various ramifications when 
the communication channel is a radio channel and the receiving terminal is mobile, as 
in the case of cellular radio communications. In such mobile radio communications, 
the transmitted signal is imparted with Doppler frequency shifts or Doppler spreading, 
which destroys the orthogonality among the subcarriers and, as a consequence, results 
in interchannel interference (ICI). The ICI produces a significant degradation in the 
performance (error probability) of the OFDM system. The degree of performance 
degradation is proportional to the speed at which the receiving terminal is moving. 
In general, the degradation is small when the terminal is moving at pedestrian speed. 
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FIGURE 11.2-7 

Filter bank implementation of OFDM receiver. 

This is the case, for example, in wireless LANs that employ OFDM signals with large 
(M = 64) QAM signal constellations. 

The detrimental effects of ICI in a multicarrier system, such as OFDM, can be 
significantly reduced by employing a bank of parallel filters in the implementation 
of the system, as illustrated in Figure 11.2-7. In such an implementation, the proto- 
type filter H 0 (f) and, hence, its frequency-shifted versions /7/ (/) = Ho(f — k/T) 
are designed to have sharp cutoff frequency-response characteristics. Consequently, a 
Doppler frequency spread that is small compared to 1 /2 T, or equivalently, compared to 
the bandwidth of the prototype filter Ho(f), will result in negligible ICI. For example. 
Figure 11.2-8 illustrates the frequency-response characteristics in such a lilter bank 
implementation. Note that the filter sidelobes are approximately 70 dB below the main 
lobe, and the spectral overlap between adjacent filters is negligible. Such filter charac- 
teristics provide significant immunity against ICI that may be encountered in mobile 
radio communication environments. 

The price paid for achieving this immunity to ICI caused by Doppler spreading 
is the added complexity in the implementation of the filters {//a(/)} at the transmitter 
and the receiver. An efficient implementation for the filter bank, based on multirate 
digital signal processing methods, has been described in the papers by Cherubini 
et al. (2000, 2002). The resulting filter bank implementation of the multicarrier system is 
called filtered multitone (FMT) modulation. The spectral characteristics shown in Fig- 
ure 1 1 .2-8 correspond to lilter frequency responses in an FMT multicarrier modulation 
system. 


11.2-7 Bit and Power Allocation in Multicarrier Modulation 

We now consider a bit and power allocation procedure to optimize the performance of 
a multicarrier system transmitting over a linear time-invariant channel with AWGN. 
We assume that there are N subcarriers and that the modulation on each subcarrier is 
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FIGURE 11.2-8 

An example of the magnitude of the frequency response of adjacent subchannel filters in an 
FMT system for /e (0, 0.06-^) and design parameters N = 64. [From Cherubini et al. (2002) 
IEEE.] 

QAM, where M, = 2 b ' is the constellation size and /?, is the number of bits transmitted 
on the ith subcarrier in the frame interval of T seconds. Thus, the total bit rate is 

1 * 

R b = -J2 b ‘ (11.2-29) 

1 = 1 

The power allocated to the ith subcarrier is P,, and the total transmitted power is 

N 

P = J2 p i (11.2-30) 

i=i 

which is constrained to be a fixed value. 

The bandwidth of each subchannel is assumed to be sufficiently narrow that the 
complex-valued channel gain C(/j) is constant across the frequency band of the ith 
subchannel. For convenience, we also assume that the spectral density of the additive 
Gaussian noise in the N subchannels is identical. 

In selecting the bit and power allocation among the N subchannels, our objective 
is to maximize the bit rate Rt, for a specified error probability that is the same across 
the N subchannels. It is convenient to use the symbol error probability for QAM as the 
performance index and to focus on the low-error-rate (high-SNR) region. The symbol 
error probability for QAM at low error rates is well approximated by the expression 

/ / 3P,|C,[ 2 \ 

^ V No(Mi - 1 )J 


Pe*4Q 


(11.2-31) 
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where P e is the desired symbol error probability and C, = C(/,). The multiplier in 
front of the Q function represents the number of nearest neighbors in a rectangular 
QAM signal constellation. Therefore, P, and M, are selected such that 


It has been shown by Kalet (1989) that transmitting equal power across all sub- 
channels for which C, | 2 /M) is sufficiently large to support at least an M = 4 signal 
constellation at the desired low symbol error probability results in near optimum per- 
formance. Hence, we may begin by allocating equal power among the subchannels and 
deleting all subcarriers which cannot support at least an M = 4 signal constellation at 
the desired error probability. Then we allocate the total transmit power equally among 
the remaining subchannels and compute the value of Af, that satisfies the desired error 
probability given by Equation (11.2-32). 

At this point, we may simply truncate the values of {M,} to {M,} such that 


are integers. However, when the number of subchannels is large, this simple alloca- 
tion procedure may result in a significant loss in rate. Alternatively, we may use the 
unquantized value of each /W, that satisfies the desired symbol error probability and 
either round up to the next-higher power of 2 or truncate to the next-lower power of 
2, if the fractional part of the bit h, = log 2 M, is greater than 1 /2 or lower than 1 /2, 
respectively. The allocated power for each subchannel is then adjusted accordingly to 
satisfy the desired error probability. This power allocation procedure may be performed 
sequentially, beginning with the subchannel having the largest C,j 2 / /Vo, where at each 
step the remaining power is allocated equally among the remaining subchannels. Thus, 
the total power allocation is kept constant. 

As an example, let us consider high-speed digital transmission over wirelines that 
connect a telephone subscriber’s premises to a telephone central office. These wireline 
channels typically consist of unshielded twisted-pair wire and are commonly called 
the subscriber local loop. The desire to provide high-speed Internet access to homes 
and businesses over the telephone subscriber loop has resulted in the development of a 
standard for digital transmission based on OFDM with QAM as the basic modulation 
method on each of the subcarriers. 

The usable bandwidth of a twisted-pair subscriber loop wire is primarily limited by 
the distance between the subscriber and the central telephone office, i.e., the length of 
the wire, and by crosstalk interference from other lines in the same cable. For example, 
a 3-km twisted-pair wireline may have a usable bandwidth of approximately 1.2 MHz. 
Since the need for high-speed digital transmission is usually in the direction from the 
central office to the subscriber (the downlink) and the bandwidth is relatively small, 
the major part of the bandwidth is allocated to the downlink. Consequently, the digital 
transmission on the subscriber loop is asymmetric, and this transmission mode is called 
ADSL ( asymmetric digital subscriber line). 

In the ADSF standard, the downlink and the uplink maximum data rates are spec- 
ified as 6.8 Mbps and 640 kbps, respectively, for subscriber lines of approximately 
12,000 ft in length, and 1.544 Mbps and 176 kbps, respectively, for subscriber lines of 



(11.2-32) 


bi = log 2 M i , 1 = 1,2,...,# 


(11.2-33) 


Chapter Eleven: Multichannel and Multicarrier Systems 


757 


approximately 18,000 ft in length. The low part of the frequency band (0-25 kHz) 
is reserved for the telephone voice transmission, which requires a nominal band- 
width of 4 kHz. Hence, the frequency band of the subscriber line is separated into 
two frequency bands via two filters (lowpass and highpass) that have cutoff frequen- 
cies of 25 kHz. Thus, the low-end frequency for digital transmission is 25 kHz. The 
ADSL standard specifies that the frequency range of 25 kHz to 1 . 1 MHz must be sub- 
divided into 256 parallel OFDM subchannels Hence, the size of the DFT and IDFT 
in the system implementation shown in Figure 1 1.2^1 is A = 512. A sampling rate 
f s = 2.208 MHz is specified, so that the high-end frequency in the signal spectrum 
is /j/2 = 1.104 MHz. The frequency spacing between two adjacent subcarriers is 
A / = 1.104 x 10 6 /256 = 4.3125 kHz. The channel time dispersion is suppressed by 
using a cyclic prefix of A/16 = 32 samples. 

By measuring the signal-to-noise ratio (SNR) for each subchannel at the receiver 
and communicating this information to the transmitter via the uplink, the transmitter 
can select the QAM constellation size in bits per symbol to achieve a desired error 
probability in each subchannel. The ADSL standard specifies a minimum bit load 
of 2 bits per subchannel, which corresponds to QPSK modulation. If a subchannel 
cannot support QPSK at the desired error probability, no information is transmitted 
over that subchannel. As an example, Figure 11.2-9 illustrates the received SNR as 
measured by the receiver for each subchannel and the corresponding number of bits per 
symbol selected from a QAM signal constellation. Note that the SNR in subchannels 
220-256 is too low to support QPSK modulation; hence, no data are transmitted on 
these subchannels. ADSL channel characteristics and the design of OFDM modems 
based on the ADSL standard are treated in detail in the books by Bingham (2000) and 
Starr et al. (1999). The use of OFDM with variable size QAM signal constellations for 
each of the subcarriers is sometimes called discrete multitone (DMT) modulation. 


11.2-8 Peak-to-Average Ratio in Multicarrier Modulation 

A major problem with multicarrier modulation is the relatively high peak-to-average 
ratio (PAR) that is inherent in the transmitted signal. In general, large signal peaks 
occur in the transmitted signal when the signals in many of the various subchannels 



(a) 



(b) 


FIGURE 11.2-9 

Example of a DSL frequency response and bit allocation on the OFDM subchannels. 
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add constructively in phase. Such large signal peaks may result in clipping of the 
signal voltage in a D/A converter when the multicarrier signal is synthesized digitally, 
and/or it may saturate the power amplifier and thus cause intermodulation distortion 
in the transmitted signal. When the number N of subcarriers is large, the central limit 
theorem may be used to model the combined signal on the N subchannels as a zero-mean 
Gaussian random process. In such a model, the voltage PAR is proportional to \/N. 

To avoid intermodulation distortion, it is common to reduce the power in the trans- 
mitted signal and thus operate the power ampliiier at the transmitter in the linear oper- 
ating range. This power reduction or “power backoff” results in inefficient operation of 
the communication system. For example, if the PAR is 10 dB, the power backoff may 
be as much as 10 dB to avoid intermodulation distortion. 

Various methods have been devised to reduce the PAR in multicarrier systems. 
One of the simplest methods is to insert different phase shifts in each of the subcarriers. 
These phase shifts can be selected pseudorandomly, or by means of some algorithm, 
to reduce the PAR. For example, we may have a small set of N stored pseudorandomly 
selected phase shifts which can be used when the PAR in the modulated subcarriers is 
large. The information on which set of pseudorandom phase shifts is used in any signal 
interval can be transmitted to the receiver on one of the N subcarriers. Alternatively, 
a single set of pseudorandom phase shifts may be employed, where this set is found 
via computer simulation to reduce the PAR to an acceptable level over the ensemble of 
possible transmitted data symbols on the N subcarriers. 

Another method that can be used to reduce the PAR is to modulate a small subset of 
the subcarriers with dummy symbols which are selected to reduce the PAR. Since the 
dummy symbols do not have to be constrained to take amplitude and phase values from 
a specified signal constellation, the design of the dummy symbols is very flexible. The 
subcarriers carrying dummy symbols may be distributed across the frequency band. 
Since modulating subcarriers with dummy symbols results in a lower throughput in 
data rate, it is desirable to employ only a small percentage of the total subcarriers for 
this purpose. 

As an alternative to allocating subcarriers that are modulated with dummy symbols, 
one may select a subset of subcarriers that already carry data and expand the signal 
constellation in such a manner that the data can be correctly detected at the receiver 
by use of a modulo-r/ operation, where q is an appropriate integer. For example, if 
rectangular 16-point QAM is used as the modulation of each subcarrier, a minimally 
expanded signal constellation for a subset of subcarriers may consist of a 32-point 
signal constellation that includes the 16 additional points adjacent to the outer points in 
the original constellation. When the PAR of the original signal constellation exceeds a 
predetermined amount, the signal point on a selected subcarrier is replaced by a signal 
point from the minimally expanded set such that the PAR is reduced. This approach 
may require several iterations using a different subcarrier each time to reduce the PAR 
to a desired value. The interested reader may refer to the paper by Tellado and Cioffi 
(1998), which treats this method. 

In a digitally synthesized multicarrier signal, the PAR may be kept within a spec- 
ified limit by clipping the signal at the D/A converter. The clipping generally distorts 
the signal at the transmitter and hence degrades the performance at the receiver. The 
effect of clipping on the probability of error at the detector in an OFDM system has 
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been evaluated by Bahai and Saltzberg (1999). If the clipping occurs infrequently, the 
occasional errors may be corrected by introducing a suitable error-correcting code. 

Because of its practical importance, the problem of PAR reduction in multicar- 
rier systems has been investigated by many people, and methods other than the ones 
described above have been considered. The interested reader may refer to the papers 
by Boyd (1986), Popovic (1991), Jones et al. (1994), Wilkinson and Jones (1995), 
Wulich (1996), Li and Cimini (1997), Friese (1997), Muller et al. (1997), Tellado and 
Cioffi (1998), Wulich and Goldfeld (1999), Tarokh and Jafarkhani (2000), Peterson and 
Tarokh (2000), and Wunder and Boche (2003). 


11.2-9 Channel Coding Considerations in Multicarrier Modulation 

In single-carrier systems, channel coding is performed in the time domain. That is, 
the coded bits or symbols span multiple signal or symbol intervals. In multicarrier 
communication systems, such as OFDM, the frequency domain provides an additional 
dimension in which channel coding can be applied to achieve immunity against noise 
and other interference. 

One possible channel coding approach is to encode the information bits on each 
subcarrier separately (time-domain channel coding) using either a block code, or a 
convolutional code, or by employing trellis-coded modulation (TCM). In such a time- 
domain coding approach, the coded bits or symbols span multiple OFDM (multicarrier) 
frames. There are basically two disadvantages with time-domain channel coding for 
multicarrier communication systems. One is the encoding/decoding complexity in- 
volved in the operation of N parallel encoders/decoders for the N subchannels. The 
second is the latency (decoding delay) inherent in the decoding of the data on the N 
subcarriers over multiple frames. For example, the decoding delay for a code that spans 
K frames is K Nj bits, where Nf is the number of information bits per frame. 

The decoding delay can be minimized by designing the channel code to span the bits 
across the subchannels for a single OFDM (multicarrier) frame. In such a frequency- 
domain coding approach we may employ a block code, or a convolutional code, or 
TCM. If additional delay beyond a single frame is tolerable, the channel code may be 
designed to span multiple OFDM frames. The advantage of this approach to channel 
coding for multicarrier communication systems is that a single encoder and decoder 
can be employed in the system, thus simplifying the system implementation. 

Although the channel coding methods for multicarrier modulation described above 
focused on simple coding techniques (block coding, convolutional coding, TCM), they 
are easily extended to concatenated coding and turbo coding methods. 


■ 11.3 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

Multichannel signal transmission is commonly used on time-varying channels to over- 
come the effects of signal fading. This topic is treated in some detail in Chapter 13, 
where we provide a number of references to published work. Of particular relevance 
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to the treatment of multichannel digital communications given in this chapter are the 
two publications by Price (1962a, b). 

There is a large amount of literature on multicarrier digital communication systems. 
Such systems have been implemented and used for over 35 years. One of the earliest 
systems, described by Doeltz et al. (1957) and called Kineplex, was used for digital 
transmission in the HF band. Other early work on multicarrier system design has been 
reported in the papers by Chang (1966) and Saltzberg (1967). The use of the DFT for 
modulation and demodulation of multicarrier systems was proposed by Weinstein and 
Ebert (1971). 

Of particular interest in recent years is the use of multicarrier digital transmission 
for data, facsimile, and video on a variety of channels, including the narrowband (4 kHz) 
switched telephone network, the 48-kHz group telephone band, digital subscriber lines, 
cellular radio, and audio broadcast. The interested reader may refer to the many papers 
in the literature. We cite as examples the papers by Hirosaki (1981), Hirosaki et al. 
(1986), Chow et al. (1991), and the survey paper by Bingham (1990). The paper by 
Kalet (1989) gives a design procedure for optimizing the rate in a multicarrier QAM 
system given constraints on transmitter power and channel characteristics. Finally, we 
cite the book by Vaidyanathan (1993) and the papers by Tzannes et al. (1994) and Rizos 
et al. (1994) for a treatment of multirate digital biter banks, and the books by Starr et 
al. (1999) and Bingham (2000) on the application of multicarrier modulation for digital 
transmission on digital subscriber lines. 


PROBLEMS 

11.1 X\, X2, . . . , X N are a set of N statistically independent and identically distributed real 
Gaussian random variables with moments E(X ,) = m and var (X/) = a 2 , 
a. Define 


N 



n = 1 


Evaluate the SNR of U, which is defined as 


(SNR)(/ = 


I E(U )\ 2 
la 2 


where a ^ is the variance of U . 
b. Define 


N 



n = 1 


Evaluate the SNR of V, which is defined as 


(SNR)y = 


[E{V)f 

2<?v 


where <j^ is the variance of V . 
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c. Plot (SNR);/ and (SNR)y versus m 2 /a 1 on the same graph and, thus, compare the 
SNRs graphically. 

d. What does the result in (c) imply regarding coherent detection and combining versus 
square-law detection and combining of multichannel signals? 

11.2 A binary communication system transmits the same information on two diversity channels. 
The two received signals are 

r\ = ±\J~£b + «i 
r 2 = ± \f£b + «2 

where E{n\) = E(n 2 ) = 0, E (n 2 ^ = a 2 and E (n 2 ) = a 2 , and n \ and n 2 are uncorrelated 
Gaussian variables. The detector bases its decision on the linear combination of r\ and 
r 2 , i.e., 

r = r i + kr 2 

a. Determine the value of k that minimizes the probability of error. 

b. Plot the probability of error for a 2 = 1, a 2 = 3, and either k = 1 or k is the optimum 
value found in (a). Compare the results. 

11.3 Assess the cost of the cyclic prefix (used in multicarrier modulation to avoid ISI) in 
terms of 

a. Extra channel bandwidth. 

b. Extra signal energy. 

11.4 Let x (n) be a finite-duration signal with length N and let X(k) be its A-point DFT. Sup- 
pose we pad x(ri) with L zeros and compute the ( A + L)-point DFT, X'{k). What is the 
relationship between X(0) and A'(0)? If we plot |X(£)| and |X'(£)| on the same graph, 
explain the relationships between the two graphs. 

1 1.5 Show that the sequence {x n } given by Equation 11.2-11 corresponds to the samples of the 
signal x(t) given by Equation 1 1.2-12. 

11.6 Show that the IDFT of a sequence {2G, 0 < k < N — 1} can be computed by passing the 
sequence {A/.} through a bank of N linear discrete-time filters with system functions 

1 

H = \ - e j2nn/N z ~\ 

and sampling the filter outputs at n = N . 

11.7 Plot P 2 (L), given by Equation 1 1.1-24 for L = I and L = 2 as a function of 10 log Yb 
and determine the loss in SNR due to the combining loss for Yb = 10. 



Spread Spectrum Signals for Digital 
Communications 


Spread spectrum signals used for the transmission of digital information are distin- 
guished by the characteristic that their bandwidth W is much greater than the informa- 
tion rate R in bits/s. That is, the bandwidth expansion factor B e = W/R for a spread 
spectrum signal is much greater than unity. The large redundancy inherent in spread 
spectrum signals is required to overcome the severe levels of interference that are 
encountered in the transmission of digital information over some radio and satellite 
channels. Since coded waveforms are also characterized by a bandwidth expansion 
factor greater than unity and since coding is an efficient method for introducing redun- 
dancy, it follows that coding is an important element in the design of spread spectrum 
signals and systems. 

A second important element employed in the design of spread spectrum signals 
is pseudorandomness, which makes the signals appear similar to random noise and 
difficult to demodulate by receivers other than the intended ones. This element is 
intimately related with the application or purpose of such signals. 

To be specific, spread spectrum signals are used for 


• Combating or suppressing the detrimental effects of interference due to jamming, 
interference arising from other users of the channel, and self-interference due to 
multipath propagation. 

• Hiding a signal by transmitting it at low power and, thus, making it difficult for an 
unintended listener to detect in the presence of background noise. 

• Achieving message privacy in the presence of other listeners. 


In applications other than communications, spread spectrum signals are used to obtain 
accurate range (time delay) and range rate (velocity) measurements in radar and navi- 
gation. For the sake of brevity, we shall limit our discussion to digital communication 
applications. 

In combating intentional interference (jamming), it is important to the communi- 
cators that the jammer who is trying to disrupt the communication does not have prior 
knowledge of the signal characteristics except for the overall channel bandwidth and 
the type of modulation (PSK. FSK, etc.) being used. If the digital information is just 
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encoded as described in Chapters 7 and 8, a sophisticated jammer can easily mimic the 
signal emitted by the transmitter and, thus, confuse the receiver. To circumvent this pos- 
sibility, the transmitter introduces an element of unpredictability or randomness (pseu- 
dorandomness) in each of the transmitted coded signal waveforms that is known to the 
intended receiver but not to the jammer. As a consequence, the jammer must synthesize 
and transmit an interfering signal without knowledge of the pseudorandom pattern. 

Interference from the other users arises in multiple-access communication systems 
in which a number of users share a common channel bandwidth. At any given time, a 
subset of these users may transmit information simultaneously over the common chan- 
nel to corresponding receivers. Assuming that all the users employ the same code for 
the encoding and decoding of their respective information sequences, the transmitted 
signals in this common spectrum may be distinguished from one another by superim- 
posing a different pseudorandom pattern, also called a code , in each transmitted signal. 
Thus, a particular receiver can recover the transmitted information intended for it by 
knowing the pseudorandom pattern, i.e., the key, used by the corresponding transmitter. 
This type of communication technique, which allows multiple users to simultaneously 
use a common channel for transmission of information, is called code division multiple 
access (CDMA). CDMA will be considered in Sections 12.2 and 12.3. 

Resolvable multipath components resulting from time-dispersive propagation 
through a channel may be viewed as a form of self-interference. This type of inter- 
ference may also be suppressed by the introduction of a pseudorandom pattern in the 
transmitted signal, as will be described below. 

A message may be hidden in the background noise by spreading its bandwidth 
with coding and transmitting the resultant signal at a low average power. Because of its 
low power level, the transmitted signal is said to be “covert.” It has a low probability 
of being intercepted (detected) by a casual listener and, hence, is also called a low- 
probability-of-intercept (LPI) signal. 

Finally, message privacy may be obtained by superimposing a pseudorandom pat- 
tern on a transmitted message. The message can be demodulated by the intended re- 
ceivers, who know the pseudorandom pattern or key used at the transmitter, but not by 
any other receivers who do not have knowledge of the key. 

In the following sections, we shall describe a number of different types of spread 
spectrum signals, their characteristics, and their applications. The emphasis will be on 
the use of spread spectrum signals for combating interference (antijam or AJ signals), 
CDMA, and LPI. Before discussing the signal design problem, however, we shall briefly 
describe the types of channel characteristics assumed for the applications cited above. 


■ 12.1 

MODEL OF SPREAD SPECTRUM DIGITAL COMMUNICATION SYSTEM 

The block diagram shown in Figure 12.1-1 illustrates the basic elements of a spread 
spectrum digital communication system with a binary information sequence at its input 
at the transmitting end and at its output at the receiving end. The channel encoder 
and decoder and the modulator and demodulator are basic elements of the system, 
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Information 

sequence 



FIGURE 12.1-1 

Model of spread spectrum digital communication system. 

which were treated in Chapters 4, 7, and 8. In addition to these elements, we have two 
identical pseudorandom pattern generators, one that interfaces with the modulator at the 
transmitting end and a second that interfaces with the demodulator at the receiving end. 
The generators generate a pseudorandom or pseudonoise (PN) binary-valued sequence 
which is impressed on the transmitted signal at the modulator and removed from the 
received signal at the demodulator. 

Synchronization of the PN sequence generated at the receiver with the PN sequence 
contained in the incoming received signal is required in order to demodulate the re- 
ceived signal. Initially, prior to the transmission of information, synchronization may be 
achieved by transmitting a fixed pseudorandom bit pattern that the receiver will recog- 
nize in the presence of interference with a high probability. After time synchronization 
of the generators is established, the transmission of information may commence. 

Interference is introduced in the transmission of the information-bearing signal 
through the channel. The characteristics of the interference depend to a large extent 
on its origin. It may be categorized as being either broadband or narrowband relative 
to the bandwidth of the information-bearing signal and as either continuous or pulsed 
(discontinuous) in time. For example, an interfering signal may consist of one or more 
sinusoids in the bandwidth used to transmit the information. The frequencies of the 
sinusoids may remain fixed or they may change with time according to some rule. As 
a second example, the interference generated in CDMA by other users of the channel 
may be either broadband or narrowband, depending on the type of spread spectrum 
signal that is employed to achieve multiple access. If it is broadband, it may be charac- 
terized as an equivalent additive white Gaussian noise. We shall consider these types 
of interference and some others in the following sections. 

Our treatment of spread spectrum signals will focus on the performance of the dig- 
ital communication system in the presence of narrowband and broadband interference. 
Two types of modulation are considered: PSK and FSK. PSK is appropriate in appli- 
cations where phase coherence between the transmitted signal and the received signal 
can be maintained over a time interval that is relatively long compared to the reciprocal 
of the transmitted signal bandwidth. On the other hand, FSK modulation is appropriate 
in applications where such phase coherence cannot be maintained due to time- variant 
effects on the communications link. This may be the case in a communications link 
between two high-speed aircraft or between a high-speed aircraft and a ground terminal. 

The PN sequence generated at the modulator is used in conjunction with the 
PSK modulation to shift the phase of the PSK signal pseudorandomly as described 
in Section 12.2. The resulting modulated signal is called a direct sequence (DS) or a 
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pseudo-noise (PN) spread spectrum signal. When used in conjunction with binary or 
M- ary (M > 2) FSK, the pseudorandom sequence selects the frequency of the trans- 
mitted signal pseudorandomly. The resulting signal is called a frequency-hopped (FH) 
spread spectrum signal. Although a number of other types of spread spectrum signals 
will be briefly described, the emphasis of our treatment will be on DS and FFI spread 
spectrum signals. 


■ 12.2 

DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS 


In the model shown in Figure 12. 1-1, we assume that the information rate at the input 
to the encoder is R bits/s and the available channel bandwidth is W Hz. The modulation 
is assumed to be binary PSK. In order to utilize the entire available channel bandwidth, 
the phase of the carrier is shifted pseudorandomly according to the pattern from the PN 
generator at a rate W times/s. The reciprocal of W, denoted by T c , defines the duration 
of a pulse, which is called a chip', T c is called the chip interval. The pulse is the basic 
element in a DS spread spectrum signal. 

If we define 7), = I / R to be the duration of a rectangular pulse corresponding to 
the transmission time of an information bit, the bandwidth expansion factor W/R may 
be expressed as 


W _ Tb 
R T c 


In practical systems, the ratio Tb/T c is an integer, 


L c = 


n 

T c 


( 12 . 2 - 1 ) 


( 12 . 2 - 2 ) 


which is the number of chips per information bit. That is, L c is the number of phase shifts 
that can occur in the transmitted signal during the bit duration Tb = 1 / R. Figure 1 2.2- 1 a 
illustrates the relationships between the PN signal and the data signal. 

Suppose that the encoder takes k information bits at a time and generates a binary 
linear (n , k) block code. The time duration available for transmitting the n code elements 
is kTb seconds. The number of chips that occur in this time interval is kL c . Hence, 
we may select the block length of the code as n = kL c . If the encoder generates a 
binary convolutional code of rate k/n, the number of chips in the time interval kT\, 
is also n = kL c . Therefore, the following discussion applies to both block codes and 
convolutional codes. We note that the code rate R c = k/n = l/L c . 

One method for impressing the PN sequence on the transmitted signal is to alter 
directly the coded bits by modulo-2 addition with the PN sequence.^ Thus, each coded 


tWhen four-phase PSK is desired, one PN sequence is added to the information sequence carried on the 
in-phase signal component and a second PN sequence is added to the information sequence carried on the 
quadrature component. In many PN spread spectrum systems, the same binary information sequence is 
added to the two PN sequences to form the two quadrature components. Thus, a four-phase PSK signal is 
generated with a binary information stream. 
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Data signal 
+1 


(a) PN and data signals 



(b) DS-QPSK modulator 


FIGURE 12.2-1 

The PN and data signals (a) and the QPSK modulator ( b ) for a DS spread spectrum system. 

bit is altered by its addition with a bit from the PN sequence. If 6, represents the ith 
bit of the PN sequence and c ,■ is the corresponding bit from the encoder, the modulo-2 
sum is 


a, = bj © Ci 


(12.2-3) 


Hence, a, = 1 if either b, = 1 and c, = 0 or bj = 0 and c t = 1 ; also a, = 0 if either 
bj = 1 and c L = I or /?, = 0 and c, = 0. We may say that a,- = 0 when b, = c, and 
a, = 1 when /?, ^ c, . The sequence {«, } is mapped into a binary PSK signal of the 
form s(f) = ±R e[g(t)e j27T ^ t ] according to the convention 


gi(t) 


g(t — iT c ) dj = 0 
—g(t — iT c ) a t = 1 


(12.2-4) 


where g(t) represents a pulse of duration T c seconds and arbitrary shape. 
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The modulo-2 addition of the coded sequence {c, } and the sequence { 6, ) from 
the PN generator may also be represented as a multiplication of two waveforms. To 
demonstrate this point, suppose that the elements of the coded sequence are mapped 
into a binary PSK signal according to the relation 

c t (t) = (2c, - Dg(t-iT c ) (12.2-5) 

Similarly, we define a waveform /;,(/) as 

Pi (t) = (2bi - \)p(t-iT c ) (12.2-6) 

where pit) is a rectangular pulse of duration T c . Then the equivalent low-pass trans- 
mitted signal corresponding to the ith coded bit is 

gii 0 = Pi(t)Ci(t ) 

= (2 bi - l)(2c, - l)g(f - iT c ) (12.2-7) 

This signal is identical to the one given by Equation 12.2—4, which is obtained from the 
sequence {a, }. Consequently, modulo-2 addition of the coded bits with the PN sequence 
followed by a mapping that yields a binary PSK signal is equivalent to multiplying a 
binary PSK signal generated from the coded bits with a sequence of unit amplitude 
rectangular pulses, each of duration T c , and with a polarity which is determined from 
the PN sequence according to Equation 12.2-6. Although it is easier to implement 
modulo-2 addition followed by PSK modulation instead of waveform multiplication, 
it is convenient, for purposes of demodulation, to consider the transmitted signal in 
the multiplicative form given by Equation 12.2-7. A functional block diagram of a 
four-phase PSK-DS spread spectrum modulator is shown in Figure 12.2— 1(b). 

The received equivalent low-pass signal for the ith code element is 

nit) = Piit)ciit ) + zit), iT c < t < (i + 1)T C 
= i2b l -l)i2c i -l)git-iT c ) + zit) 

where zit) represents the low-pass equivalent noise and interference signal corrupting 
the information-bearing signal. This signal is assumed to be a stationary random process 
with zero mean. 

If zit) is a sample function from a complex- valued Gaussian process, the optimum 
demodulator may be implemented either as a filter matched to the waveform git) or 
as a correlator, as illustrated by the block diagrams in Figure 12.2-2. In the matched 
filter realization, the sampled output from the matched filter is multiplied by 2b, — 1 , 
which is obtained from the PN generator at the demodulator when the PN generator is 
properly synchronized. Since (2/;, — l) 2 = 1 when lg = 0 and b, = 1, the effect of the 
PN sequence on the received coded bits is thus removed. 

In Figure 12.2-2, we also observe that the cross correlation can be accomplished in 
either one of two ways. The first, illustrated in Figure 12.2-2b, involves premultiplying 
r, it) with the waveform p,(t) generated from the output of the PN generator and then 
cross-correlating with g*(t) and sampling the output in each chip interval. The second 
method, illustrated in Figure 12.2-2c, involves cross correlation with g*(t) first, sam- 
pling the output of the correlator and, then, multiplying this output with 2 bj — 1 , which 
is obtained from the PN generator. 
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To 

decoder 


(b) 



To 

decoder 


(c) 


FIGURE 12.2-2 

Possible demodulator structures for PN spread spectrum signals. 


If z(t ) is not a Gaussian random process, the demodulation methods illustrated 
in Figure 12.2-2 are no longer optimum. Nevertheless, we may still use any of these 
three demodulator structures to demodulate the received signal. When the statistical 
characteristics of the interference z( t ) are unknown a priori, this is certainly one possible 
approach. An alternative method, which is described later, utilizes an adaptive filter 
prior to the matched filter or correlator to whiten the interference. The rationale for this 
second method is also described later. 

In Section 12.2-1 , we derive the error rate performance of the DS spread spectrum 
system in the presence of wideband and narrowband interference. The derivations are 
based on the assumption that the demodulator is any of the three equivalent structures 
shown in Figure 12.2-2. 


12.2-1 Error Rate Performance of the Decoder 

Let the unquantized output of the demodulator be denoted by yj, 1 < j < n. First we 
consider a linear binary (n, k) block code and, without loss of generality, we assume 
that the all-zero code word is transmitted. 
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A decoder that employs soft-decision decoding computes the correlation metrics 

n 

CMj = ]T(2 Cij - l)y jt * = 1,2 2 k (12.2-9) 

7= i 

where c/j denotes the / th bit in the ;th code word. The correlation metric corresponding 
to the all-zero code word is 


CM\ = 2 n£ c + ^(2ci j - 1)(2 bj - 1 )vj 
j = i 

n 

= 2 n£ c — y^(2bj — 1 ) Vj 
7= i 


( 12 . 2 - 10 ) 


where vj, I < j < n, is the additive noise and interference term corrupting the /th 
coded bit and £ c is the chip energy. It is defined as 


w 


m 


Vj = Re{j\*(t)z[t + U -l)T c ]dty j = 1,2, ... ,n (12.2-11) 

Similarly, the correlation metric corresponding to code word c m having weight 
is 


( 2 \ 

1 - — - J + 5Z( 2c mJ - 1)(2 bj - 1 )Vj (12.2-12) 

/ 7=i 

Following the procedure used in Section 7.4, we shall determine the probability 
that CM m > CM\. The difference between CM \ and CM m is 


D = CM i — CM,,, 

n 

= 4 £ c w m -2J2 cmj(2bj - 1 ) Vj ( 12 . 2 - 13 ) 

7=1 

Since the codeword c m has weight iv m , there are w m nonzero components in the 
summation of noise terms contained in Equation 12 . 2 - 13 . We shall assume that the 
minimum distance of the code is sufficiently large that we can invoke the central limit 
theorem for the summation of noise components. This assumption is valid for DS spread 
spectrum signals that have a bandwidth expansion of 10 or more. ' Thus, the summation 
of noise components is modeled as a Gaussian random variable. Since E(2bj — 1 ) = 0 
and E(Vj) = 0 , the mean of the second term in Equation 12 . 2-13 is also zero. 

The variance is 

n n 

a >n = 4 c mi c m jE[(2bj - 1)(2 bi - 1 )}E{Vi Vj) ( 12 . 2 - 14 ) 

7=1 <= i 


tTypically, the bandwidth expansion factor in a spread spectrum signal is of the order of 10 to 100 and 
sometimes higher. 
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The sequence of binary digits from the PN generator are assumed to be uncorrelated. 
Hence 


E\(2bj - 1)(2 bi - 1)] = Sij (12.2-15) 

and 


a m = 4w m E(v 2 ) 


(12.2-16) 


where E(\r ) is the second moment of any one element from the set ( vj } . This moment 
is easily evaluated to yield 


E(v 2 ) = T I f S*(t)g(r)R zz (t - r)dt dr 
J Jo Jo 

= \ r \G(f)\ 2 S zz (f)df 

^ . 7—00 


(12.2-17) 


where R zz ( x) = E[z*(t)z(t + r)] is the autocorrelation function and S z -( f ) is the power 
spectral density of the interference z(f). 

We observe that when the interference is spectrally flat within the bandwidth^ 
occupied by the transmitted signal, i.e., 

S zz (f) = 2J 0 , \f\<\W (12.2-18) 


the second moment in Equation 12.2-17 is E(v 2 ) = 2 £ c Jq, and, hence, the variance of 
the interference term in Equation 12.2-16 becomes 

ol = 8 £ c J 0 w m (12.2-19) 


In this case, the probability that D < 0 is 


P 2 (m) = Q 



( 12 . 2 - 20 ) 


But the energy per coded bit £ c may be expressed in terms of the energy per information 
bit £ b as 


£, = -£b = R,£b 
n 

With his substitution, Equation 12.2-20 becomes 

Pi{m) = Q I \\ — R <: w n 


Jo 

= Q (\/2 y b R c 


vu„ 


( 12 . 2 - 21 ) 


( 12 . 2 - 22 ) 


tlf the bandwidth of the bandpass channel is W, that of the equivalent low-pass channel is j W. 
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where Yb = £/> / Jo is the SNR per information bit. Finally, the code word error proba- 
bility may be upper-bounded by the union bound as 

M 

Q(V 2 YbRcU> m ) (12.2-23) 

m—2 

where M = 2 k . Note that this expression is identical to the probability of a code word 
error for soft-decision decoding of a linear binary block code in an AWGN channel. 

Although we have considered a binary block code in the derivation given above, 
the procedure is similar for an (n, k) convolutional code. The result of such a derivation 
is the following upper bound on the equivalent bit error probability: 


j CXJ 

Pb<- Y, PdQ(V 2 YbRcd) (12.2-24) 

^ d=d fBX 

The set of coefficients {/I,/) is obtained from an expansion of the derivative of the 
transfer function T(Y, Z), as described in Section 8.2-2. 

Next, we consider a narrowband interference centered at the carrier (at DC for 
the equivalent low-pass signal). We may fix the total (average) interference power to 
J av = 2 JqW, where 2 Jo is the value of the power spectral density of an equivalent 
wideband interference. The narrowband interference is characterized by the power 
spectral density 


S u (f) 



Wi 

0 


I/I < 
I/I > 


(12.2-25) 


where W 3> W\. 

Substitution of Equation 12.2-25 for S zz (f ) into Equation 12.2-17 yields 


£(v 2 ) = 



(12.2-26) 


The value of E(v 2 ) depends on the spectral characteristics of the pulse g(f)- In the 
following example, we consider two special cases. 

example 12.2-1. Suppose that g(t) is a rectangular pulse as shown in Figure 12.2— 3(a) 
and |G(/)j is the corresponding energy density spectrum shown in Figure 12.2-3(b). 
For the narrowband interference given by Equation 12.2-25, the variance of the total 
interference is 


°m = 4 W m E{v 2 ) 

_ 4 E c w m T c J w r Wl/2 / sin nfT c \ 2 

ffil J-W1/2 V nfT c ) (12.2-27) 

4£ c w,„J m f^ 2 / siii7rx 
-p /2 V * x 



W 1 


Value of integral 
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FIGURE 12.2-3 

Rectangular pulse and its energy density spectrum. 


where ft = W\T C . Figure 1 2.2 — 4 illustrates the value of this integral for 0< P < 1. 
We observe that the value of the integral is upper-bounded by unity. Hence, er“ < 
4-<F (; W m J av/ W 1 . 

In the limit as W \ becomes zero, the interference becomes an impulse at the carrier. 
In this case the interference is a pure frequency tone and it is usually called a continuous 
wave (CW) interfering signal. The power spectral density is 


S ZZ (J) = AJ(f) 


(12.2-28) 


and the corresponding variance for the decision variable D = CM\ — CM,,, is 


= 2w m j m \Grn 2 

— 4,i, f T J 

— c J av 


(12.2-29) 


The probability of a codeword error for CW interference is upper-bounded as 


M 


Pe<Y,Q 

m—2 



(12.2-30) 



FIGURE 12.2-4 

Plot of the value of the integral in Equation 12.2-27. 
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g( t) 



FIGURE 12.2-5 

A sinusoidal signal pulse. 


But £ c = R c £b . Furthermore, T c 1 / W and / av /VF = 2Jq. Therefore Equation 
12.2-30 may be expressed as 


M 


Pe<J2Q 

m= 2 



(12.2-31) 


which is the result obtained previously for broadband interference. This result indicates 
that a CW interference has the same effect on performance as an equivalent broadband 
interference. This equivalence is discussed further below. 


example 12 . 2 - 2 . Let us determine the performance of the DS spread spectrum system 
in the presence of a CW interference of average power / av when the transmitted signal 
pulse g(t) is one-half cycle of a sinusoid as illustrated in Figure 12.2-5, i.e.. 


g(0 = 



0<t<T c 


The variance of the interference of this pulse is 


= 2w m / av |G(0)|- 


32 

2‘ 


— t £<: Pc J av tUm 


7 r 


(12.2-32) 


(12.2-33) 


Hence, the upper bound on the codeword probability is 


M 


Pe<Y,Q 

m= 2 



(12.2-34) 


We observe that the performance obtained with this pulse is 0.9 dB better than that 
obtained with a rectangular pulse. Recall that this pulse shape when used in offset 
QPSK results in an MSK signal. MSK modulation is frequently used in DS spread 
spectrum systems. 


The processing gain and the interference margin An interesting interpretation 
of the performance characteristics for the DS spread spectrum signal is obtained by 
expressing the signal energy per bit <?/, in terms of the average power. That is, £/, = 
/ J ;l v 7), . where P av is the average signal power and 7), is the bit interval. Let us consider 
the performance obtained in the presence of CW interference for the rectangular pulse 
treated in Example 12.2-1. When we substitute for £/, and Jq into Equation 12.2-31, 


774 


Digital Communications 


we obtain 

M 

p e <Y,Q 

m= 2 


( j4P av T b 
VV Av T c 



Ee 



/- ( R c w„ 


(12.2-35) 


where L c is the number of chips per information bit and P av / J a v is the signal-to- 
interference power ratio. 

An identical result is obtained with broadband interference for which the perfor- 
mance is given by Equation 12.2-23. For the signal energy per bit, we have 

£ h = P^T b = % (12.2-36) 

K 

where R is the information rate in bits/s. The power spectral density for the interference 
may be expressed as 


2 Jq = 



W 


Using this relation and Equation 12.2-36, the ratio £b/Jo may be expressed as 


£ b _ P^/R _ 2 W/R 
Jq 4/2W Jav/Pav 


(12.2-37) 


The ratio J av /P av is the interference-to-signal power ratio, which is usually greater 
than unity. The ratio W/R = Tb/T c = B e = L, is just the bandwidth expansion factor, 
or, equivalently, the number of chips per information bit. This ratio is usually called the 
processing gain of the DS spread spectrum system. It represents the advantage gained 
over the interference that is obtained by expanding the bandwidth of the transmitted 
signal. If we interpret £b/Jo as the SNR required to achieve a specified error rate 
performanace and W/R as the available bandwidth expansion factor, the ratio J ay /P av 
is called the interference margin of the DS spread spectrum system. In other words, the 
interference margin is the largest value that the ratio J av / P av can take and still satisfy 
the specified error probability. 

The performance of a soft-decision decoder for a linear (n, k) binary code, ex- 
pressed in terms of the processing gain and the interference margin, is 


< (M-l)Q 


1 4 W/R 

7 a v / P'a\ 




(12.2-38) 


In addition to the processing gain W/R and J av /P w , we observe that the performance 
depends on a third factor, namely, R c w m . This factor is the coding gain. A lower 
bound on this factor is R, d min . Thus the interference margin achieved by the DS spread 
spectrum signal depends on the processing gain and the coding gain. 

We may express the relationship among these three quantities in dB as 

(SNR)dB = ( 2] ^) + (RcdminhB - ( ^) (12.2-39) 

V R / dB \"av/dB 

where the (SNR) d B is the signal-to-noise ratio required by the receiver to achieve a 
specified level of performance. 
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Uncoded DS spread spectrum signals The performance results given above for 
DS spread spectrum signals generated by means of an (n, k) code may be specialized 
to a trivial type of code, namely, a binary repetition code. For this case, k = 1 and the 
weight of the nonzero code word is w = n. Thus, R c w = 1 and, hence, the performance 
of the binary signaling system reduces to 


Note that the trivial (repetition) code gives no coding gain. It does result in a 
processing gain of W/R. 

example 12 . 2 - 3 . Suppose that we wish to achieve an error rate performance of 1 0 f> or 
less with an uncoded DS spread spectrum system. The available bandwidth expansion 
factor is W/R — 1000. Let us determine the jamming margin. 

The £b/ Jo required to achieve a bit error probability of 10~ 6 with uncoded binary 
PSK is 10.5 dB. The processing gain is 10 logio 1000 = 30 dB. Hence the maximum 
interference-to-signal power that can be tolerated, i.e., the interference margin, is 


Since this is the interference margin achieved with an uncoded DS spread spectrum 

system, it may be increased by coding the information sequence. 

There is another way to view the modulation and demodulation processes for the 
uncoded (repetition code) DS spread spectrum system. At the modulator, the signal 
waveform generated by the repetition code with rectangular pulses, for example, is 
identical to a unit amplitude rectangular pulse s(t) of duration 7), or its negative, de- 
pending on whether the information bit is 1 or 0, respectively. This may be seen from 
Equation 12.2-7, where the coded chips {c,} within a single information bit are either 
all Is or 0s. The PN sequence multiplies either s(t) or — s(t). Thus, when the informa- 
tion bit is a 1, the L c PN chips generated by the PN generator are transmitted with the 
same polarity. On the other hand, when the information bit is a 0, the L c PN chips when 
multiplied by —s(t) are reversed in polarity. 

The demodulator for the repetition code, implemented as a correlator, is illustrated 
in Figure 12.2-6. We observe that the integration interval in the integrator is the bit 
interval 7),. Thus, the decoder for the repetition code is eliminated and its function is 
subsumed in the demodulator. 

Now let us qualitatively assess the effect of this demodulation process on the 
interference z(t). The multiplication of z(t) by the output of the PN generator, which 
is expressed as 



(12.2-40) 


Q 



10 log 10 — = 33 - 10.5 = 22.5 dB 


wit) = ^(2 bi - 1 )p(t - i T c ) 
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FIGURE 12.2-6 

Correlation-type demodulator for a 
repetition code. 


yields 


v(t ) = w(t)z(t) 

The waveforms wit) and z,(t) are statistically independent random processes each with 
zero-mean and autocorrelation functions R ww ( r) and R zz ( r), respectively. The product 
v(t) is also a random process having an autocorrelation function equal to the product 
of Rww(,t) with R zz (r). Hence, the power spectral density of the process v(t) is equal 
to the convolution of the power spectral density of w(t) with the power spectral density 
of z{t). 

The effect of convolving the two spectra is to spread the power in bandwidth. 
Since the bandwidth of w(t) occupies the available channel bandwidth W, the result 
of convolution of the two spectra is to spread the power spectral density of z(t) over 
the frequency band of width W. If z.(t) is a narrowband process, i.e., its power spectral 
density has a width much less than W, the power spectral density of the process v(t) 
will occupy a bandwidth equal to at least W. 

The integrator used in the cross correlation shown in Figure 12.2-6 has a bandwidth 
approximately equal to 1/7},. Since 1/7/ <$C W, only a fraction of the total interference 
power appears at the output of the correlator. This fraction is approximately equal to 
the ratio of bandwidths 1/ 7/ to W. That is, 

1/7/ _ 1 _ T c _ 1 

W WT b 7/ L c 

In other words, the multiplication of the interference with the signal from the PN 
generator spreads the interference to the signal bandwidth W, and the narrowband inte- 
gration following the multiplication sees only the fraction 1 /L c of the total interference. 
Thus, the performance of the uncoded DS spread spectrum system is enhanced by the 
processing gain L c . 

Linear code concatenated with a repetition code As illustrated above, a binary 
repetition code provides a margin against an interference signal but yields no coding 
gain. To obtain an improvement in performance, we may use a linear (n \ , k) block or 
convolutional code, where n t < n = kL c . One possibility is to select n \ < n and to 
repeat each code bit m times such that n = n\U 2 - Thus, we can construct a linear (n, k) 
code by concatenating the (m, k) code with a binary (nz, 1) repetition code. This may 
be viewed as a trivial form of code concatenation where the outer code is the (n i, k) 
code and the inner code is the repetition code. 
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Since the repetition code yields no coding gain, the coding gain achieved by the 
combined code must reduce to that achieved by the (n \ , k) outer code. It is demonstrated 
that this is indeed the case. The coding gain of the overall combined code is 

k , 

R c w m = —w m , m = 2,3, ... ,2 
n 

But the weights {w,„} for the combined code may be expressed as 

w m = n 2 w° m 


where { w " t } are the weights of the outer code. Therefore, the coding gain of the combined 
code is 


k k 

R c w m = n 2 w° m = — w° n 

n\n 2 n \ 


= R°w° 


( 12 . 2 ^ 11 ) 


which is just the coding gain obtained from the outer code. 

A coding gain is also achieved if the (n \ , k) outer code is decoded using hard 
decisions. The probability of a bit error obtained with an (n 2 , 1) repetition code (based 
on soft-decision decoding) is 


P = Q 




= Q 


\ 

■W^av J 


(12.2-42) 


Then the codeword error probability for a linear (n \ , k ) block code is upper-bounded 
as 


n i 

^ E 

m=t + 1 


p m ( i - p ) 


n i —m 


(12.2-43) 


where t = \ \( d m m — 1)J, or as 

M 

P e < E ^^ 1 - P^ </2 (12.2-44) 

m= 2 

where the latter is a Chernov bound. For an (/? i , k) binary convolutional code, the upper 
bound on the bit error probability is 


p h < E ^Piid) (12.2-45) 

d=d { jee 

where Pi(d) is defined by Equation 8.2-16 for odd d and by Equation 8.2-17 for 
even d. 
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Concatenated coding for DS spread spectrum systems It is apparent from the 
above discussion that an improvement in performance can be obtained by replacing 
the repetition code by a more powerful code that will yield a coding gain in addition 
to the processing gain. Basically, the objective in a DS spread spectrum system is to 
construct a long, low-rate code having a large minimum distance. This may be best ac- 
complished by using code concatenation. When binary PSK is used in conjunction with 
DS spread spectrum, the elements of a concatenated code word must be expressed in 
binary form. 

Best performance is obtained when soft-decision decoding is used on both the 
inner and outer codes. However, an alternative, which usually results in reduced com- 
plexity for the decoder, is to employ soft-decision decoding on the inner code and 
hard-decision decoding on the outer code. The expressions for the error rate perfor- 
mance of these decoding schemes depend, in part, on the type of codes (block or 
convolutional) selected for the inner and outer codes. For example, the concatenation 
of two block codes may be viewed as an overall long binary (n, k) block code having a 
performance given by Equation 12.2-38. The performance of other code combinations 
may also be readily derived. For the sake of brevity, we shall not consider such code 
combinations. 


12.2-2 Some Applications of DS Spread Spectrum Signals 

In this subsection, we shall briefly consider the use of coded DS spread spectrum signals 
for two specific applications. One is concerned with a communication signal that is 
hidden in the background noise by transmitting the signal at a very low power level. 
The second application is concerned with accommodating a number of simultaneous 
signal transmissions on the same channel, i.e., CDMA. 

Low-detectability signal transmission In this application, the signal is purposely 
transmitted at a very low power level relative to the background channel noise and 
thermal noise that is generated in the front end of the receiver. If the DS spread spec- 
trum signal occupies a bandwidth W and the spectral density of the additive noise is 
Nq/2 W/Hz, the average noise power in the bandwidth W is N av = WNq. 

The average received signal power at the intended receiver is P m . If we wish to hide 
the presence of the signal from receivers that are in the vicinity of the intended receiver, 
the signal is transmitted at a low power level such that P- dV /N- dV <3C 1. For example, let 
us assume that binary PSK is used to transmit the information. The probability of error 
at the intended receiver may be expressed as 




Chapter Twelve: Spread Spectrum Signals for Digital Communications 


779 


From this expression, we observe that even though P av / /V av <<C 1 , the intended receiver 
can recover the information-bearing signal with the aid of the processing gain and 
the coding gain. However, any other receiver that has no prior knowledge of the PN 
sequence is unable to take advantage of the processing gain and the coding gain. Hence, 
the presence of the information-bearing signal is difficult to detect. We say that the signal 
has a low probability of being intercepted (LPI) and it is called an LPI signal. 

The probability of error results given in Section 12.2-1 also apply to the demodu- 
lation and decoding of LPI signals at the intended receiver. 


Code division multiple access The enhancement in performance obtained from a 
DS spread spectrum signal through the processing gain and coding gain can be used 
to enable many DS spread spectrum signals to occupy the same channel bandwidth 
provided that each signal has its own distinct PN sequence. Thus, it is possible to have 
several users transmit messages simultaneously over the same channel bandwidth. This 
type of digital communication in which each user (transmitter-receiver pair) has a 
distinct PN code for transmitting over a common channel bandwidth is called code 
division multiple access (CDMA). 

In the demodulation of each PN signal, the signals from the other simultaneous 
users of the channel appear as an additive interference. The level of interference varies, 
depending on the number of users at any given time. A major advantage of CDMA is 
that a large number of users can be accommodated if each transmits messages for a 
short period of time. In such a multiple access system, it is relatively easy either to add 
new users or to decrease the number of users without disrupting the system. 

Let us determine the number of simultaneous signals that can be supported in 
a CDMA system." For simplicity, we assume that all signals have identical average 
powers. Thus, if there are N u simultaneous users, the desired signal-to-noise inteference 
power ratio at a given receiver is 



Av (N„ - 1 )P av N u - 1 


(12.2-46) 


Hence, the performance for soft-decision decoding at the given receiver is upper- 
bounded as 


M 


Pe<Y,Q 


m—2 


I4W/R 
N u - 1 


RcW m <(M - \)Q 


I4W/R 

n u - r 


Rrdrr 


[\2.2-41) 


In this case, we have assumed that the interference from other users is Gaussian. 

As an example, suppose that the desired level of performance (error probability of 
10 6 ) is achieved when 

4W/R „ , 

— -R c dmm — 40 
N„ - 1 


tin this section the interference from other users is treated as a random process. This is the case if there 
is no cooperation among the users. In Chapter 16 we consider CDMA transmission in which interference 
from other users is known and is suppressed by the receiver. 
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Then the maximum number of users that can be supported in the CDMA system is 

W/R 

N u = ]Q R c d mm + 1 (12.2-48) 

If W/R = 100 and R r d, mn = 4, as obtained with the Golay (24, 12) code, the maximum 
number is N u = 41 . If W/R = 1000 and R c d mi n = 4, this number becomes N„ = 401 . 

In determining the maximum number of simultaneous users of the channel, we 
have implicitly assumed that the PN code sequences are mutually orthogonal and the 
interference from other users adds on a power basis only. However, orthogonality among 
a number of PN code sequences is not easily achieved, especially if the number of PN 
code sequences required is large. In fact, the selection of a good set of PN sequences 
for a CDMA system is an important problem that has received considerable attention 
in the technical literature. We shall briefly discuss this problem in Section 12.2-5. 

Digital cellular CDMA system based on DS spread spectrum Direct sequence 
CDMA has been adopted as one multiple-access method for digital cellular voice 
communications in North America. This digital cellular communication system was 
proposed and developed by Qualcomm and has been standardized and designated as 
IS-95 by the Telecommunications Industry Association (TIA) for use in the 800-MHz 
and in the 1900-MHz frequency bands. 

The nominal bandwidth used for transmission from a base station to the mobile 
receivers (forward link) is 1.25 MHz, and a separate channel, also with a bandwidth 
of 1.25 MHz, is used for signal transmission from mobile receivers to a base station 
(reverse link). The signals transmitted in both the forward and the reverse links are DS 
spread spectrum signals having a chip rate of 1 .2288 x 10 6 chips per second (Mchips/s). 

Forward link A block diagram of the modulator for the signals transmitted from 
a base station to the mobile receivers is shown in Figure 12.2-7. The speech coder is a 
code-excited linear predictive (CELP) coder which generates data at the variable rates 
of 9600, 4800, 2400, and 1200 bits/s, where the data rate is a function of the speech 
activity of the user, in frame intervals of 20 ms. The data from the speech coder is 
encoded by a rate 1 /2, constraint length K = 9 convolutional code. For lower speech 
activity, where the data rates are 4800, 2400, or 1200 bits/s, the output symbols from 
the convolutional encoder are repeated either twice, four times, or eight times so as 
to maintain a constant bit rate of 9600 bits/s. At the lower speech activity rates, the 
transmitter power is reduced by either 3, 6, or 9 dB, so that the transmitted energy per 
bit remains constant for all speech rates. Thus, a lower speech activity results in a lower 
transmitter power and, hence, a lower level of interference to other users. 

The encoded bits for each frame are passed through a block interleaver, which is 
needed to overcome the effects of burst errors that may occur in transmission through 
the channel. The data bits at the output of the block interleaver, which occur at a rate 
of 19.2 kbits/s, are scrambled by multiplication with the output of a long code (period 
N = 2 42 — 1) generator running at the chip rate of 1.2288 M chips/s, but whose output is 
decimated by a factor of 64 to 19.2 kchips/s. The long code is used to uniquely identify 
a call of a mobile station on the forward and reverse links. 
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Each user of the channel is assigned a Hadamard (or Walsh) sequence of length 64. 
There are 64 orthogonal Hadamard sequences assigned to each base station, and, thus, 
there are 64 channels available. One Hadamard sequence (the all-zero sequence) is used 
to transmit a pilot signal, which serves as a means for measuring the channel character- 
istics, including the signal strength and the carrier phase offset. These parameters are 
used at the receiver in performing phase coherent demodulation. Another Hadamard 
sequence is used for providing time synchronization. One channel, and possibly more 
if necessary, is used for paging. That leaves up to 61 channels for allocation to different 
users. 

Each user, using the Hadamard sequence assigned to it, multiplies the data sequence 
by the assigned Hadamard sequence. Thus, each encoded data bit is multiplied by the 
Hadamard sequence of length 64. The resulting binary sequence is now spread by 
multiplication with two PN sequences of length N = 2 15 , so as to create in-phase and 
quadrature signal components. Thus, the binary data signal is converted to a four-phase 
signal and both the I and Q components are filtered by baseband spectral shaping filters. 
Different base stations are identified by different offsets of these PN sequences. The 
signals for all the 64 channels are transmitted synchronously so that, in the absence of 
channel multipath distortion, the signals of other users received at any mobile receiver 
do not interfere because of the orthogonality of the Hadamard sequences. 

At the receiver, a RAKE demodulator is used to resolve the major multipath sig- 
nal components, which are then phase-aligned and weighted according to their signal 
strength using the estimates of phase and signal strength derived from the pilot signal. 
These components are combined and passed to the Viterbi soft-decision decoder. The 
RAKE demodulator is described in detail in Chapter 13. 

Reverse link The modulator for the reverse link from a mobile transmitter to a base 
station is different from that for the forward link. A block diagram of the modulator 
is shown in Figure 12.2-8. An important consideration in the design of the modulator 
is that signals transmitted from the various mobile transmitters to the base station 
are asynchronous and, hence, there is significantly more interference among users. 
Secondly, the mobile transmitters are usually battery operated and, consequently, these 
transmissions are power limited. To compensate for these major limitations, a K = 9, 
rate 1 /3 convolutional code is used in the reverse link. Although this code has essentially 
the same coding gain in an AWGN channel as the rate 1 /2 code used in the forward link, 
it has a much higher coding gain in a fading channel, which is the characteristic of digital 
cellular communication links, as we shall observe in our treatment of communication 
through fading channels in Chapter 13. As in the case of the forward link, for lower 
speech activity, the output bits from the convolutional encoder are repeated either two, 
or four, or eight times. However, the coded bit rate is 28.8 kbits/s. 

For each 20-ms frame, the 576 encoded bits are block-interleaved and passed to 
the modulator. The data is modulated using an M = 64 orthogonal signal set using 
Hadamard sequences of length 64. Thus, a 6-bit block of data is mapped into one 
of the 64 Hadamard sequences. The result is a bit (or chip) rate of 307.2 kbits/s at 
the output of the modulator. We note that 64- ary orthogonal modulation at an error 
probability of 10 -6 requires approximately 3.5 dB less SNR per bit than binary antipodal 
signaling. 
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To reduce interference to other users, the time position of the transmitted code 
symbol repetitions is randomized so that, at the lower speech activity, consecutive 
bursts do not occur evenly spaced in time. Following the randomizer, the signal is 
spread by the output of the long code PN generator, which is running at a rate of 
1.2288 Mchips/s. Hence, there are only four PN chips for every bit of the Hadamard 
sequence from the modulator, so the processing gain in the reverse link is very small. 
The resulting 1.2288 Mchips/s binary sequence at the output of the multiplier is 
then further multiplied by two PN sequences of length N = 2 15 , whose rate is also 
1.2288 Mchips/s, to create I and Q signals (a QPSK signal) which are filtered by base- 
band spectral shaping filters and then passed to quadrature mixers. The ^-channel 
signal is delayed in time by one-half PN chip relative to the /-channel signal prior to 
the baseband filter. In effect, the signal at the output of the two baseband filters is an 
offset QPSK signal. 

Although the chips are transmitted as an offset QPSK signal, the demodulator 
employs noncoherent demodulation of the M = 64 orthogonal Hadamard waveforms 
to recover the encoded data bits. A fast Hadamard transform is used to reduce the 
computational complexity in the demodulation process. The output of the demodula- 
tor is then fed to the Viterbi detector, whose output is used to synthesize the speech 
signal. 


12.2-3 Effect of Pulsed Interference on DS Spread Spectrum Systems 

Thus far, we have considered the effect of continuous interference or jamming on a 
DS spread spectrum signal. We have observed that the processing gain and coding gain 
provide a means for overcoming the detrimental effects of this type of interference. 
However, there is a jamming threat that has a dramatic effect on the performance of 
a DS spread spectrum system. That jamming signal consists of pulses of spectrally 
flat noise that covers the entire signal bandwidth W. This is usually called pulsed 
interference . 

Suppose the jammer has an average power J dV in the signal bandwidth W . Hence 
2 Jo = J aw /W. Instead of transmitting continuously, the jammer transmits pulses at a 
power J av / a for a percent of the time, i.e., the probability that the jammer is transmitting 
at a given instant is a. For simplicity, we assume that an interference pulse spans an 
integral number of signaling intervals and, thus, it affects an integral number of bits. 
When the jammer is not transmitting, the transmitted bits are assumed to be received 
error- free, and when the jammer is transmitting, the probability of error for an uncoded 
DS spread spectrum system is Q{^/2a£i,/ J q). Hence, the average probability of a bit 
error is 


The jammer selects the duty cycle a to maximize the error probability. On differentiating 
Equation 12.2-49 with respect to a, we find that the worst-case pulse jamming occurs 



(12.2-49) 
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when 


0.71 

£bl Jo 

1 


S b /Jo> 0.71 
£ b /J 0 < 0.71 


and the corresponding error probability is 

0.083 


£b/ Jo 



£ h /J 0 > 0.71 
£ b /J 0 < 0.71 


(12.2-50) 


(12.2-51) 


The error rate performance given by Equation 12.2-49 for a = 1 .0, 0.1, and 0.01 
along with the worst-case performance based on a* is plotted in Figure 12.2-9. By 
comparing the error rate for continuous Gaussian noise jamming with worst-case pulse 
jamming, we observe a large difference in performance, which is approximately 40 dB 
at an error rate of 1 0 6 . 

We should point out that the above analysis applies when the jammer pulse duration 
is equal to or greater than the bit duration. In addition, we should indicate that practical 
considerations may prohibit the jammer from achieving high peak power (small values 
of a). Nevertheless, the error probability given by Equation 12.2-5 1 serves as an upper 
bound on the performance of the uncoded binary PSK in worst-case pulse jamming. 
Clearly, the performance of the DS spread spectrum system in the presence of such 
interference is extremely poor. 

If we simply add coding to the DS spread spectrum system, the improvement over 
the uncoded system is the coding gain. Thus, £ b / Jo is reduced by the coding gain, 



FIGURE 12.2-9 

Performance of DS binary PSK with pulse 
interference. 
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FIGURE 12.2-10 

Block diagram of AJ communication system. 

which in most cases is limited to less than 10 dB. The reason for the poor performance 
is that the jamming signal pulse duration may be selected to affect many consecutive 
coded bits when the jamming signal is turned on. Consequently, the code word error 
probability is high due to the burst characteristics of the jammer. 

In order to improve the performance, we should interleave the coded bits prior 
to transmission over the channel. The effect of the interleaving, as discussed in Sec- 
tion 7. 12, is to make the coded bits that are hit by the jammer statistically independent. 

The block diagram of the digital communication system that includes interleaving/ 
deinterleaving is shown in Figure 12.2-10. Also shown is the possibility that the receiver 
knows the jammer state, i.e., that it knows when the jammer is on or off. Knowledge 
of the jammer state (called side information) is sometimes available from channel 
measurements of noise power levels in adjacent frequency bands. In our treatment, 
we consider two extreme cases, namely, no knowledge of the jammer state or com- 
plete knowledge of the jammer state. In any case, the random variable £ representing 
the jammer state is characterized by the probabilities 

P(£ = 1) = a, P(£ = 0) = 1 - a (12.2-52) 

When the jammer is on, the channel is modeled as an AWGN with power spectral 
density Nq = Jo/or, and when the jammer is off, there is no noise in the channel. 
Knowledge of the jammer state implies that the decoder knows when £ = 1 and when 
£ = 0, and uses this information in the computation of the correlation metrics. For 
example, the decoder may weight the demodulator output for each coded bit by the 
reciprocal of the noise power level in the interval. Alternatively, the decoder may give 
zero weight (erasure) to a jammed bit. 

First, let us consider the effect of jamming without knowledge of the jammer state. 
The interleaver/deinterleaver pair is assumed to result in statistically independent jam- 
mer hits of the coded bits. As an example of the performance achieved with coding, 
we cite the performance results from the paper of Martin and McAdam (1980). There 
the performance of binary convolutional codes is evaluated for worst-case pulse jam- 
ming. Both hard- and soft-decision Viterbi decoding are considered. Soft decisions 
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are obtained by quantizing the demodulator output to eight levels. For this purpose, a 
uniform quantizer is used for which the threshold spacing is optimized for the pulse 
jammer noise level. The quantizer plays the important role of limiting the size of the 
demodulator output when the pulse jammer is on. The limiting action ensures that any 
hit on a coded bit does not heavily bias the corresponding path metrics. 

The optimum duty cycle for the pulse jammer in the coded system is generally 
inversely proportional to the SNR, but its value is different from that given by Equa- 
tion 12.2-50 for the uncoded system. Figure 12.2-1 1 illustrates graphically the optimal 
jammer duty cycle for both hard- and soft-decision decoding of the rate 1 /2 convolu- 
tional codes. The corresponding error rate results for this worst-case pulse jammer are 
illustrated in Figures 12.2-12 and 12.2-13 for rate 1/2 codes with constraint lengths 
3 < K < 9. For example, note that at Ri = 10 (> , the K = 7 convolutional code 
with soft-decision decoding requires £i,/ J<\ = 7.6 dB, whereas hard-decision decoding 
requires £b/Jo = 11.7 dB. This 4.1-dB difference in SNR is relatively large. With 
continuous Gaussian noise, the corresponding SNRs for an error rate of 10 6 are 5 dB 
for soft-decision decoding and 7 dB for hard-decision decoding. Hence, the worst-case 
pulse jammer has degraded the performance by 2.6 dB for soft-decision decoding and 
by 4.7 dB for hard-decision decoding. These levels of degradation increase as the con- 
straint length of the convolutional code is decreased. The important point, however, is 
that the loss in SNR due to jamming has been reduced from 40 dB for the uncoded 
system to less than 5 dB for the coded system based on a K =7, rate 1 /2 convolutional 
code with interleaving. 

A simpler method for evaluating the performance of a coded anti-jamming (AJ) 
communication system is to use the cutoff rate parameter R 0 as proposed by Omura 
and Fevitt (1982). For example, with binary-coded modulation, the cutoff rate may be 
expressed as 


R 0 = 1 - log( 1 + A«) 


(12.2-53) 


Probability of a bit error, P b Probability of a bit error, P b 
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FIGURE 12.2-12 

Performance of rate 1 /2 convolutional 
codes with hard-decision Viterbi decoding 
binary PSK with worst-case pulse jamming. 
[From Martin and McAdam (1980). © 

1980 IEEE.] 


FIGURE 12.2-13 

Performance of rate 1 /2 convolutional codes 
with soft-decision Viterbi decoding binary 
PSK with worst-case pulse jamming. [From 
Martin and McAdam (1980). © 1980 IEEE.] 
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where the factor A„ depends on the channel noise characteristics and the decoder 
processing. Recall that for binary PSK in an AWGN channel and soft-decision decoding, 

A a = e- £c/No (12.2-54) 

where £ c is the energy per coded bit; and for hard-decision decoding, 

A„ = v/4p(l - p) (12.2-55) 

where p is the probability of a coded bit error. Here, we have No = Jo- 

For a coded binary PSK, with pulse jamming, Omura and Le vitt (1982) have shown 

that 

A„ = ae a£c / N « for soft-decision decoding with 

knowledge of jammer state 

A a = min { [a exp (X 2 £ c / No/a) + 1 — a] exp(— 2A.£ C )} 

for soft-decision decoding with 
no knowledge of jammer state 

A a = a^Ap(\ — p) for hard-decision decoding with 

knowledge of the jammer state 

A„ = s/4ap( 1 — ap ) for hard-decision decoding with 

no knowledge of the jammer state 

where the probability of error for hard-decision decoding of binary PSK is 



The graphs for Ro as a function of £ c /Nq are illustrated in Figure 12.2-14 for 
the cases given above. Note that these graphs represent the cutoff rate for the worst- 
case value of a = a* that maximizes A a (minimizes R 0 ) for each value of £ c /No- 
Furthermore, note that with soft-decision decoding and no knowledge of the jammer 
state, Ro = 0. This situation results from the fact that the demodulator output is not 
quantized. 

The graphs in Figure 12.2-14 may be used to evaluate the performance of coded 
systems. To demonstrate the procedure, suppose that we wish to determine the SNR 
required to achieve an error probability of 10 6 with coded binary PSK in worst-case 
pulse jamming. To be specific, we assume that we have a rate 1/2, K = 7 convolutional 
code. We begin with the performance of the rate 1/2, K = 7 convolutional code with 
soft-decision decoding in an AWGN channel. At P 2 = 10 6 , the SNR required is found 
from Figure 8.6-1 to be 


(12.2-56) 

(12.2-57) 

(12.2-58) 

(12.2-59) 
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(0) Soft-decision decoding in AWGN (a = 1) 

(1) Soft-decision with jammer state information 

(2) Hard-decision with jammer state information 

(3) Soft-decision with no jammer state information 

(4) Hard-decision with no jammer state information 


FIGURE 12.2-14 

Cutoff rate for coded DS binary PSK modulation. [From Omura and Levitt (1982). © 1982 
IEEE ]. 


Since the code is rate 1 /2, we have 


£c 

No 


= 2dB 


Now, we go to the graphs in Figure 12.2-14 and find that for the AWGN channel 
(reference system) with £ c /No = 2dB, the corresponding value of the cutoff rate is 


Ro = 0.74 bit per symbol 


If we have another channel with different noise characteristics (a worst-case pulse noise 
channel) but with the same value of the cutoff rate Rq, then the upper bound on the 
bit error probability is the same, i.e., 1 0 6 in this case. Consequently, we can use this 
rate to determine the SNR required for the worst-case pulse jammer channel. From the 
graphs in Figure 12.2-14, we find that 


10 dB 


Sc = i 
Jo 


5 dB 
3 dB 


for hard-decision decoding with 
no knowledge of jammer state 

for hard-decision decoding with 
knowledge of jammer state 

for soft-decision decoding with 
knowledge of jammer state 


Therefore, the corresponding values of £f , / Jo for the rate 1/2, K = 7 convolutional 
code are 13, 8, and 6 dB, respectively. 

This general approach may be used to generate error rate graphs for coded binary 
signals in a worst-case pulse jamming channel by using corresponding error rate graphs 
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for the AWGN channel. The approach we describe above is easily generalized to M - ary 
coded signals as indicated by Omura and Levitt (1982). 

By comparing the cutoff rate for coded DS binary PSK modulation shown in 
Figure 12.2-14, we note that for rates below 0.7, there is no penalty in SNR with soft- 
decision decoding and jammer state information compared with the performance on 
the AWGN channel {a = 1). On the other hand, at Rq = 0.7, there is a 6-dB difference 
in performance between the SNR in an AWGN channel and that required for hard- 
decision decoding with no jammer state information. At rates below 0.4, there is no 
penalty in SNR with hard-decision decoding if the jammer state is unknown. However, 
there is the expected 2-dB loss in hard-decision decoding compared with soft-decision 
decoding in the AWGN channel. 


12.2-4 Excision of Narrowband Interference in DS Spread 
Spectrum Systems 

We have shown that DS spread spectrum signals reduce the effects of interference 
due to other users of the channel and intentional jamming. When the interference is 
narrowband, the cross correlation of the received signal with the replica of the PN code 
sequence reduces the level of the interference by spreading it across the frequency 
band occupied by the PN signal. Thus, the interference is rendered equivalent to a 
lower-level noise with a relatively flat spectrum. Simultaneously the cross correlation 
operation collapses the desired signal to the bandwidth occupied by the information 
signal prior to spreading. Consequently, the power in the narrowband interference is 
reduced by an amount equal to the processing gain. 

The interference immunity of a DS spread spectrum communication system cor- 
rupted by narrowband interference can be further improved by filtering (whitening) the 
signal prior to despreading, where the objective is to reduce the level of the interference 
at the expense of introducing some distortion on the desired signal. This filtering can 
be accomplished by exploiting the wideband spectral characteristics of the desired DS 
signal and the narrowband characteristic of the interference as described below. 

To be specific, we consider the demodulator illustrated in Figure 12.2-15. The 
received signal is passed through a filter matched to the chip pulse g(t). The output of 



FIGURE 12.2-15 

Demodulator for PN spread spectrum signal corrupted by narrowband interference. 
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this filter is synchronously sampled every T c seconds to yield 

rj = 2£ c (2bj - 1)(2 c u - 1 ) + v ]t j = 1,2,... (12.2-60) 

where £ c is the energy of the chip pulse, {bj} is the binary-valued PN sequence, and 
Vj represents the additive noise and interference term. The additive noise term Vj will 
be assumed to consist of two terms, one corresponding to a broadband noise (usually 
thermal noise) and the other to narrowband interference. Consequently we may express 

n as 

rj = sj + ij + rij (12.2-61) 

where Sj denotes the signal component, ij the narrowband interference, and tij the 
broadband noise. 

The received signal sequence {r ; } at the output of the sampler is fed to a discrete- 
time filter that estimates the narrowband interference sequence {;, } and subtracts the 
estimate ij from {rj}. This filter may be either linear or non-linear. The resulting signal 
sequence {r ; — ij } is then fed to the PN correlator, whose output is passed to the decoder. 

Interference estimation and suppression based on linear prediction The interfer- 
ence component i 7 - can be estimated from the received signal by passing it through the 
linear transversal filter. Computationally efficient algorithms based on linear predic- 
tion may be used to estimate the interference. Basically, in this method the narrowband 
interference is modeled as having been generated by passing white noise through an 
all-pole filter. Hence, the output of this filter is an autoregressive (AR) process. Lin- 
ear prediction is used to estimate the coefficients of the all-pole model. The estimated 
coefficients specify an appropriate noise-whitening all-zero (transversal) filter which 
is used to suppress the narrowband interference. 

Let us assume for the moment that the statistics of the sequence {/ j] are known 
and that {i j } is a stationary random sequence. Then, because of the narrowband char- 
acteristics of {ij}, we can predict ij from rj- 1 , r 7 -_ 2 , . . . , That is, 

m 

ij = (12.2-62) 

z=i 

where {a m /} are the coefficients of an mth-orde r linear predictor. It should be empha- 
sized that Equation 12.2-62 predicts the interference but not the signal Sj, because the 
PN chips are uncorrelated and, hence, sj is uncorrelated with r 7 _/, l = 1,2 , ,m, 
where m is less than the length of the PN sequence. 

The coefficients in Equation 12.2-62 are determined by minimizing the mean 
square error between rj and ij, with respect to the predictor coefficients. This leads to 
the set of linear equations, called the Yule-Walker equations, 

m 

^2a m iR(k-l) = R(k), k = l,2,...,m (12.2-63) 

/= 1 

where R (k) = £'(r J r ;+ ^) is the autocorrelation function of the received signal { r j ) . 
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The solution of Equation 12.2-63 for the coefficients of the prediction filter requires 
knowledge of the autocorrelation function R(k). In practice, the autocorrelation function 
of {ij } and, hence, { r ; } is usually unknown, and it may also be slowly varying in 
time (nonstationary interference). In such a case, adaptive algorithms may be used 
to estimate the narrowband interference. In particular, least-squares-type algorithms, 
such as the Burg algorithm, are especially effective for estimating the coefficients 
of the linear prediction filter adaptively, as described in the paper by Ketchum and 
Proakis (1982). 

example 12.2-4. Let us consider a narrowband interference that occupies 20 per- 
cent of the spectral band occupied by the PN spread spectrum signal. The average 
power of the interference is 20 dB above the average power of the signal. The average 
power of the broadband noise is 20 dB below the average power of the signal. Fig- 
ure 12.2-16 illustrates the spectral characteristics of a 16-tap and a 29-tap FIR filter 
when the interference is equally split into four frequency bands. It is apparent that the 
29-tap filter has better spectral characteristics. In general, the number of taps in the 
filter should be about four times the number of interference bands for adequate suppres- 
sion. It is also apparent that the interference suppression filter acts as a notch filter. In 
effect, it attempts to whiten the total noise plus interference, so that the power spectral 
density of these components at its output is approximately flat. While suppressing the 
interference, the filter also distorts the desired signal by spreading it in time. 



FIGURE 12.2-16 

Frequency-response characteristics of 16- and 29-tap filters for four bands of interference. 
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Performance improvement with interference suppression Since the noise plus in- 
terference at the output of the suppression filter is spectrally flat, the matched filtering or 
cross correlation following the suppression filter should be performed with the distorted 
signal. This may be accomplished by having a filter matched to the interference suppres- 
sion filter, i.e., a discrete-time filter impulse response { —a mm , —a num - \ ... — a m \ , 1} 
followed by the PN correlator. In fact, we can combine the interference suppression 
filter and its matched filter into a single filter having an impulse response 


ho = -a, 
h k = -a, 


m.m 


k—1 


m,m—k T ^ * Om.in-lClm k-h 
1=0 
m 


1 < k < m — 1 


(12.2-64) 


h m — 1 + a ml 


i=i 


h m +k = h m - k , 0 <k <m 


The combined filter is a linear phase (symmetric) transversal filter with K = 2m + 1 
taps. The impulse response may be normalized by dividing every term by h,„. Thus 
the center tap is normalized to unity. In order to demonstrate the effectiveness of the 
interference suppression filter, we compare the performance of the DS system with and 
without the suppression filter. The output SNR is a convenient performance index for 
this purpose. Since the output of the PN correlator is characterized as Gaussian, there 
is a one-to-one correspondence between the SNR and the probability of error. 

Without the suppression filter, the PN correlator output , denoted as U i, has mean 
2 £ C L C and a variance L c (2£ c No + Rn( 0)) where R n (k) is the autocorrelation function 
of the sequence {(,} and L c is the number of chips per bit or per symbol. The output 
SNR is defined as the ratio of the square of the mean to twice the variance. Hence the 
SNR without the suppression filter is 


SNR„ 0 


£cL c 

No + Ri,(0)/2£ c 


(12.2-65) 


With an interference suppression filter having a symmetric impulse response as 
defined in Equation 12.2-64 and normalized such that the center tap is unity, the mean 
value of the correlator output is also 2 £ C L C . However, the variance of the output now 
consists of three terms. One corresponds to the additive wideband noise, the second to 
the residual narrowband interference, and the third to a self-noise caused by the time 
dispersion introduced by the suppression filter. The expression for the variance can be 
shown to be (see Ketchum and Proakis [1982]): 


K K K 


VAR[IA] = 2L C £ C N 0 J2 h l + ^EE h(l)h(k)Ru(k 

k=0 k = 0 1=0 



l ) 


( 12 . 2 - 66 ) 
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Hence the output SNR with the filter is the ratio of the square of the mean to twice the 
variance. The ratio of the SNR with the filter to the SNR without the filter is 


This ratio is called the improvement factor resulting from interference suppression. It 
may be plotted against the normalized SNR per chip without filtering, defined as 


The resulting graph of r] a versus SNR no /L c is universal in the sense that it applies to 
any PN spread spectrum system with arbitrary processing gain for a given £ c , No, and 
Ru( 0). 

As an example, the improvement factor in (decibels) is plotted against SNR no /L c 
in Figure 12.2-17 for a single -band equal-amplitude randomly phased sinusoids cov- 
ering 20 percent of the frequency band occupied by the DS spread spectrum signal. 
The interference suppression filter consists of a nine-tap suppression filter which corre- 
sponds to a fourth-order predictor. These numerical results indicate that the notch filter 
is very effective in suppressing the interference prior to PN correlation and decoding. 
As a consequence, the interference margin of the system is increased. 


N 0 + Ru( 0)/2£ c 


r)o = 



j A A <\ / ^ t 

£ hi + — Y, E KkMDRiiik -l) + 2£ c J2 (2 - k/L c )h 2 k 

k=(t t-C'C , ,_n ,_n t_n 


' c k=0 1=0 k=0 


(12.2-67) 


SNR™ 

L c 


No + Ru(0)/2S C 


( 12 . 2 - 68 ) 


20 



0 
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0 


10 


SNR/chip without filter, dB 


FIGURE 12.2-17 

Improvement factor for interference suppression filter in cascade with its matched filter. 
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The use of a linear adaptive FIR filter for suppression of narrowband interference 
in DS spread spectrum systems has been considered in the literature by many authors. 
The interested reader is referred to this literature cited in Section 12.6. A practical 
motivation for excision of narrowband signals from wideband signals is to allow the 
overlay of narrowband digital cellular systems with wideband CDMA systems. 


Interference estimation and suppression based on non-linear filtering The linear 
FIR filter used to predict the narrowband interference, which is modeled as a Gaussian 
autoregressive (AR) process, is the optimal minimum mean-square-error filter when 
the signal j.v/. } and broadband noise {/;*} components are Gaussian random processes. 
Flowever, the DS spread spectrum signal sequence {,s k } is non-Gaussian. Consequently, 
the linear estimation filter is suboptimal, in the sense that it is not the best filter for 
suppressing the narrowband interference. The optimum estimator for the narrowband 
interference is non-linear. 

By defining the state vector x k as 

Xk = Uk ik— 1 ‘ ‘ ‘ ik-m+l] (12.2—69) 

where m is the order of the AR model, it is possible to express the state vector and the 
observation sequence in the state-space form 


Xk = &x k -\ + w k 
r k = Hx k + ( n k + s k ) 


(12.2-70) 


where <l> is the state transition matrix that depends on the AR model parameters, w k is 
the white Gaussian process driving the AR model, and H = [100 ... 0]. We recall that 
the minimum mean-square-error estimator for the state at time k given the observations 
r k - 1 = [r k - 1 . r k - 2 , . . . , ro] is the conditional mean E(x k \r k - 1 ). If the signal sequence 
{.v* } and the broadband noise sequence {n k } were Gaussian, the optimum estimator for 
the state x k corresponding to the conditional mean would be the linear predictor obtained 
from the Kalman filter. Since {s k } is non-Gaussian, the conditional mean estimate is a 
non-linear function of the observations which, in general, is highly complex. However, 
it is possible to derive a reduced complexity approximation to the conditional mean 
estimate. This approach has been described in the papers by Vijayan and Poor (1990), 
Garth and Poor (1992), Rusch and Poor (1994), and Poor and Rusch (1994). The 
general configuration of the approximate conditional mean non-linear filter is shown in 
Figure 12.2-18. The non-linear function tanh(x) provides a soft-decision type feedback 
signal component. An analysis and simulation results of the performance of this type 
of non-linear filter for suppression of narrowband interference are given in the papers 
cited above. 


12.2-5 Generation of PN Sequences 

The generation of PN sequences for spread spectrum applications is a topic that has 
received considerable attention in the technical literature. We shall briefly discuss the 
construction of some PN sequences and present a number of important properties of the 
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To PN correlator 


FIGURE 12.2-18 

Non-linear excision filter. 


autocorrelation and cross-correlation functions of such sequences. For a comprehensive 
treatment of this subject, the interested reader may refer to the book by Golomb (1967). 

By far the most widely known binary PN sequences are the maximum-length shift- 
register sequences introduced in Section 7.9-5 in the context of coding. A maximum- 
length shift register sequence, or m-sequence for short, has length n = 2 m — 1 bits 
and is generated by an m -stage shift register with linear feedback as illustrated in Fig- 
ure 12.2-19. The sequence is periodic with period n. Each period of the sequence 
contains 2 m ~ 1 ones and 2 m “ 1 — 1 zeros. 

In DS spread spectrum applications the binary sequence with elements {0, 1} is 
mapped into a corresponding sequence of positive and negative pulses according to the 
relation 


Pi(t) = (2b t — l)p(t — iT) 

where p t (t) is the pulse corresponding to the element /?, in the sequence with elements 
{0, 1}. Equivalently, we may say that the binary sequence with elements {0, 1} is mapped 
into a corresponding binary sequence with elements { — 1 , 1 } . We shall call the equivalent 


m stages 



FIGURE 12.2-19 

General m -stage shift register with linear feedback. 
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sequence with elements { — 1 , 1} a bipolar sequence, since it results in pulses of positive 
and negative amplitudes. 

An important characteristic of a periodic PN sequence is its periodic autocorrelation 
function, which is usually defined in terms of the bipolar sequence as 


where n is the period. Clearly, R(j + rn) = R(j ) for any integer value r. 

Ideally, a pseudorandom sequence should have an autocorrelation function with 
the property that R( 0) = n and R(j) = 0 for 1 <j< n — 1 . In the case of m sequences, 
the periodic autocorrelation function is 


For large values of n, i.e., for long m sequences, the size of the off-peak values of R(j) 
relative to the peak value R(j)/R( 0) = — \/n is small and, from a practical viewpoint, 
inconsequential. Therefore, m sequences are almost ideal when viewed in terms of their 
autocorrelation function. 

In antijamming applications of PN spread spectrum signals, the period of the 
sequence must be large in order to prevent the jammer from learning the feedback 
connections of the PN generator. However, this requirement is impractical in most 
cases because the jammer can determine the feedback connections by observing only 
2m — 1 chips from the PN sequence. This vulnerability of the PN sequence is due to the 
linearity property of the generator. To reduce the vulnerability to a jammer, the output 
sequences from several stages of the shift register or the outputs from several distinct 
m sequences are combined in a non-linear way to produce a non-linear sequence that is 
considerably more difficult for the jammer to learn. Further reduction in vulnerability 
is achieved by frequently changing the feedback connections and/or the number of 
stages in the shift register according to some prearranged plan formulated between the 
transmitter and the intended receiver. 

In some applications, the cross-correlation properties of PN sequences are as im- 
portant as the autocorrelation properties. For example, in CDMA, each user is assigned 
a particular PN sequence. Ideally, the PN sequences among users should be mutually 
orthogonal so that the level of interference experienced by any one user from transmis- 
sions of other users adds on a power basis. However, the PN sequences used in practice 
exhibit some correlation. 

To be specific, we consider the class of m sequences. It is known (Sarwate and 
Pursley, 1980) that the periodic cross-correlation function between any pair of m se- 
quences of the same period can have relatively large peaks. Table 12.2-1 lists the peak 
magnitude R nuix for the periodic cross correlation between pairs of m sequences for 
3 <m< 12. The table also shows the number of m sequences of length n = 2'" — 1 for 
3 < m < 12. As we can see, the number of m sequences of length n increases rapidly 
with m. We also observe that, for most sequences, the peak magnitude R mm of the 
cross-correlation function is a large percentage of the peak value of the autocorrelation 
function. 


n 



0 < j < n — 1 


(12.2-71) 



(12.2-72) 
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■ TABLE 12.2-1 

Peak Cross Correlation of m Sequences and Gold Sequences 


m 

n = 2"' - 1 

Number of 
m sequences 

Peak cross 
correlation /? max 

KnW/riO) 

t(m) 

nm)/R(0) 

3 

7 

2 

5 

0.71 

5 

0.71 

4 

15 

2 

9 

0.60 

9 

0.60 

5 

31 

6 

11 

0.35 

9 

0.29 

6 

63 

6 

23 

0.36 

17 

0.27 

7 

127 

18 

41 

0.32 

17 

0.13 

8 

255 

16 

95 

0.37 

33 

0.13 

9 

511 

48 

113 

0.22 

33 

0.06 

10 

1023 

60 

383 

0.37 

65 

0.06 

11 

2047 

176 

287 

0.14 

65 

0.03 

12 

4095 

144 

1407 

0.34 

129 

0.03 


Such high values for the cross correlations are undesirable in CDMA. Although it 
is possible to select a small subset of m sequences that have relatively smaller cross- 
correlation peak values, the number of sequences in the set is usually too small for 
CDMA applications. 

PN sequences with better periodic cross-correlation properties than m sequences 
have been given by Gold (1967, 1968) and Kasami (1966). They are derived from m 
sequences as described below. 

Gold and Kasami proved that certain pairs of m sequences of length n exhibit a 
three-valued cross-correlation function with values {—1, t(m) — 2}, where 


f 2 (m+1) / 2 + 1 odd m 
l 2 (m+2) / 2 + 1 even m 


(12.2-73) 


For example, if m = 10, then t( 1 0) = 2 6 + 1 = 65 and the three possible values of 
the periodic cross-correlation function are { — 1, —65, 63}. Hence the maximum cross 
correlation for the pair of m sequences is 65, while the peak for the family of 60 
possible sequences generated by a 10-stage shift register with different feedback con- 
nections is R nVdX = 383 — about a sixfold difference in peak values. Two m sequences 
of length n with a periodic cross-correlation function that takes on the possible values 
{— 1, t(m) — 2} are called preferred sequences. 

From a pair of preferred sequences, say a = [a\ «2 ■ ■ ■ a,, I and b = [b\ ■ • • b n ], 

we construct a set of sequences of length n by taking the modulo-2 sum of a with the n 
cyclicly shifted versions of b or vice versa. Thus, we obtain n new periodic sequences^ 
with period n = 2 m — 1 . We may also include the original sequences a and b. and, thus, 
we have a total of n + 2 sequences. The n + 2 sequences constructed in this manner 
are called Gold sequences. 


tAn equivalent method for generating the n new sequences is to employ a shift register of length 2m 
with feedback connections specified by the polynomial h(X) = h\(X)h. 2 (X), where h\(X) and li 2 (X) are 
the polynomials that specify the feedback connections of the m -stage shift registers that generate the m 
sequences a and b. 
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example 12 . 2 - 5 . Let us consider the generation of Gold sequences of length n = 
31 — 2 5 — 1. As indicated above for m = 5, the cross-correlation peak is 

1(5) = 2 3 + 1 = 9 

Two preferred sequences, which may be obtained from Peterson and Weldon (1972), 
are described by the parity polynomials 

hi(X) = A 5 + A 3 + 1 

h 2 (X) = X 5 + X 4 + X 3 + X+l 

The shift registers for generating the two m sequences and the corresponding Gold 
sequences are shown in Figure 12.2-20. In this case, there are 33 different sequences, 
corresponding to the 33 relative phases of the two m sequences. Of these, 31 sequences 
are non-maximal-length sequences. 

With the exception of the sequences a and b, the set of Gold sequences is not com- 
prised of maximum-length shift -register sequences of length n. Hence, their autocorre- 
lation functions are not two- valued. Gold (1968) has shown that the cross-correlation 
function for any pair of sequences from the set of n + 2 Gold sequences is three- valued 
with possible values {— 1 , t(m) — 2}, where t(m ) is given by Equation 12.2-73. 

Similarly, the off-peak autocorrelation function for a Gold sequence takes on values 
from the set {—1, —t(m), t(m) — 2}. Hence, the off-peak values of the autocorrelation 
function are upper-bounded by t(m). 

The values of the off-peak autocorrelation function and the peak cross-correlation 
function, i.e., t(m), for Gold sequences is listed in Table 12.2-1. Also listed are the 
values normalized by R( 0). 

The frequency of occurrence for each of the three possible values of the cross 
correlation for any pair of Gold sequences may also be of interest to the system designer. 
In Table 12.2-2, we give the frequency of occurrence of the three values for the case 
in which m is odd. 

It is interesting to compare the peak cross-correlation value of Gold sequences with 
a known lower bound on the cross-correlation between any pair of binary sequences 
of period n in a set of M sequences. A lower bound derived by Welch (1974) for 



FIGURE 12.2-20 

Generation of Gold sequences of length 3 1 . 
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■ TABLE 12.2-2 

Frequency of Occurrence of Cross-Correlation 
Values for Gold Codes of Length n = 2"' — 1, m Odd 


Cross-correlation value 


Frequency of occurrence 


.p("i+l)/2 + 1] 
2 (m + l)/2 _ ! 


-1 


2 n_1 - 1 

2«-2 ^n— 3)/2 

2«-2 _|_ 2 («- 3)/2 


R, 


R, 


max 



Mn - 1 


M - 1 


(12.2-74) 


which, for large values of n and M, is well approximated as JTi. For Gold sequences, 
M = 2' n + 1, n = 2 m — 1 and the lower bound is R max ~ 2 ,n ' /2 . This bound is lower 
by \/ 2 for odd m and by 2 for even in relative to /? max = t(m ) for Gold sequences. 
Therefore, Gold sequences do not achieve the lower bound. 

A procedure similar to that used for generating Gold sequences will generate a 
smaller set of M = 2 m,/2 binary sequences of period n = 2'” — 1, where m is even. 
In this procedure, we begin with an m sequence a and we form a binary sequence b 
by taking every 2 m ' /2 + 1 bit of a. Thus, the sequence b is formed by decimating a 
by 2"' /2 + 1. It can be verified that the resulting sequence b is periodic with period 
2"' /2 — 1. For example, if m = 10, the period of a is n = 1023 and the peroid of b is 
31. Hence, if we observe 1023 bits of the sequence b, we shall see 33 repetitions of the 
31 -bit sequence. Now, by taking n = 2 m — 1 bits of the sequences a and b. we form a 
new set of sequences by adding, modulo-2, the bits from a and the bits from b and all 
2 m / 2 — 2 cyclic shifts of the bits from b. By including a in the set, we obtain a set of 
2 "‘/- binary sequences of length n = 2 m — 1. These are called Kcisami sequences. The 
autocorrelation and cross-correlation functions of these sequences take on values from 
the set {—1, — (2 m/2 + 1), 2"' /2 — 1}. Hence, the maximum cross-correlation value for 
any pair of sequences from the set is 


This value of f? max satisfies the Welch lower bound for a set of 2'"/ 2 sequences of length 
n = 2 m — 1. Hence, the Kasami sequences are optimal. 

Besides the well-known Gold and Kasami sequences, there are other binary se- 
quences appropriate for CDMA applications. The interested reader may refer to the 
work of Scholtz (1979), Olsen (1977), and Sarwate and Pursley (1980). 

Finally, we wish to indicate that, although we have discussed the periodic cross- 
correlation function between pairs of periodic sequences, many practical CDMA sys- 
tems may use information bit durations that encompass only fractions of a periodic 
sequence. In such cases, it is the partial-period cross corr elation between two sequences 
that is important. A number of papers deal with this problem, including those by Lind- 
holm (1968), Wainberg and Wolf (1970), Fredricsson (1975), Bekir et al. (1978), and 
Pursley (1979). 


tfmax = 2"' /2 + 1 


max 


(12.2-75) 
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In a frequency-hopped (FH) spread spectrum communication system the available chan- 
nel bandwidth is subdivided into a large number of contiguous frequency slots. In any 
signaling interval, the transmitted signal occupies one or more of the available fre- 
quency slots. The selection of the frequency slot(s) in each signaling interval is made 
pseudorandomly according to the output from a PN generator. Figure 12.3-1 illustrates 
a particular FH pattern in the time -frequency plane. 

A block diagram of the transmitter and receiver for an FH spread spectrum system 
is shown in Figure 12.3-2. The modulation is usually either binary or M - ary FSK. 
For example, if binary FSK is employed, the modulator selects one of two frequencies 
corresponding to the transmission of either a 1 or a 0. The resulting FSK signal is 
translated in frequency by an amount that is determined by the output sequence from 
the PN generator, which, in turn, is used to select a frequency that is synthesized by the 
frequency synthesizer. This frequency is mixed with the output of the modulator and the 
resultant frequency-translated signal is transmitted over the channel. For example, m 
bits from the PN generator may be used to specify 2'" — 1 possible frequency translations. 

At the receiver, we have an identical PN generator, synchronized with the receiver 
signal, which is used to control the output of the frequency synthesizer. Thus, the 
pseudorandom frequency translation introduced at the transmitter is removed at the 
receiver by mixing the synthesizer output with the received signal. The resultant signal 
is demodulated by means of an FSK demodulator. A signal for maintaining synchronism 
of the PN generator with the frequency-translated received signal is usually extracted 
from the received signal. 

Although PSK modulation gives better performance than FSK in an AWGN chan- 
nel, it is sometimes difficult to maintain phase coherence in the synthesis of the fre- 
quencies used in the hopping pattern and, also, in the propagation of the signal over the 
channel as the signal is hopped from one frequency to another over a wide bandwidth. 
Consequently, FSK modulation with noncoherent detection is often employed with FH 
spread spectrum signals. 



l 


] 



FIGURE 12.3-1 

An example of a frequency-hopped (FH) pattern. 
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FIGURE 12.3-2 

Block diagram of an FH spread spectrum system. 


In the FH system depicted in Figure 12.3-2, the carrier frequency is pseudoran- 
domly hopped in every signaling interval. The M information-bearing tones are con- 
tiguous and separated in frequency by 1 /T c , where T c is the signaling interval. This 
type of frequency hopping is called block hopping. 

Another type of frequency hopping that is less vulnerable to some jamming strate- 
gies is independent tone hopping. In this scheme, the M possible tones from the mod- 
ulator are assigned widely dispersed frequency slots. One method for accomplishing 
this is illustrated in Figure 12.3-3. Here, the in bits from the PN generator and the k 
information bits are used to specify the frequency slots for the transmitted signal. 

The FH rate is usually selected to be either equal to the (coded or uncoded) symbol 
rate or faster than that rate. If there are multiple hops per symbol, we have a fast-hopped 
signal. On the other hand, if the hopping is performed at the symbol rate, we have a 
slow-hopped signal. 

Fast frequency hopping is employed in AJ applications when it is necessary to 
prevent a type of jammer, called a follower jammer, from having sufficient time to 
intercept the frequency and retransmit it along with adjacent frequencies so as to create 
interfering signal components. However, there is a penalty incurred in subdividing a 
signal into several FH elements because the energy from these separate elements is 



FIGURE 12.3-3 

Block diagram of an independent tone FH spread spectrum system. 
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combined noncoherently. Consequently, the demodulator incurs a penalty in the form 
of a noncoherent combining loss as described in Section 11.1. 

FH spread spectrum signals are used primarily in digital communication systems 
that require AJ protection and in CDMA, where many users share a common bandwidth. 
In most cases, an FH signal is preferred over a DS spread spectrum signal because of 
the stringent synchronization requirements inherent in DS spread spectrum signals. 
Specifically, in a DS system, timing and synchronization must be established to within 
a fraction of the chip interval T c ~ I / W. On the other hand, in an FH system, the 
chip interval is the time spent in transmitting a signal in a particular frequency slot of 
bandwidth B <£ W. But this interval is approximately 1 / B. which is much larger than 
1/ IF. Hence the timing requirements in an FH system are not as stringent as in a DS 
system. 

In Sections 12.3-2 and 12.3-3, we shall focus on the AJ and CDMA applications 
of FH spread spectrum signals. First, we shall determine the error rate performance of 
an uncoded and a coded FH signal in the presence of broadband AWGN inteference. 
Then we shall consider a more serious type of interference that arises in AJ and CDMA 
applications, called partial-band interference. The benefits obtained from coding for 
this type of interference are determined. We conclude the discussion in Section 12.3-3 
with an example of an FH CDMA system that was designed for use by mobile users 
with a satellite serving as the channel. 


12.3-1 Performance of FH Spread Spectrum Signals in an AWGN Channel 

Let us consider the performance of an FH spread spectrum signal in the presence 
of broadband interference characterized statistically as AWGN with power spectral 
density Jq. For binary orthogonal FSK with noncoherent detection and slow frequency 
hopping (1 hop/bit), the probability of error, derived in Section 4.5-3, is 

P 2 = (12.3-1) 

where = £ b / Jq. On the other hand, if the bit interval is subdivided into L subintervals 
and FH binary FSK is transmitted in each subinterval, we have a fast FH signal. With 
square-law combining of the output signals from the corresponding matched filters for 
the L subintervals, the error rate performance of the FH signal, obtained from the results 
in Section 11.1, is 


P 200 = 7 L^~ nl2 'j2 K ‘ G nf (12-3-2) 

i=0 

where the SNR per bit is Yh = £b/ Jo = Ly c , Yc is the SNR per chip in the L-chip 
symbol, and 


L-l-i 




r = 0 


2 L - 1 
r 


(12.3-3) 
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We recall that, for a given SNR per bit yi,, the error rate obtained from Equa- 
tion 12.3-2 is larger than that obtained from Equation 12.3-1. The difference in SNR 
for a given error rate and a given L is called the noncoherent combining loss, which 
was described and illustrated in Section 11.1. 

Coding improves the performance of the FH spread spectrum system by an amount, 
which we call the coding gain, that depends on the code parameters. Suppose we use a 
linear binary (n, k ) block code and binary FSK modulation with one hop per coded bit 
for transmitting the bits. With soft-decision decoding of the square-law-demodulated 
FSK signal, the probability of a codeword error is upper-bounded as 

M 

P e < Y. p 2(m) (12.3-4) 

m—2 

where Pilrri) is the error probability in deciding between the mth codeword and the 
all-zero codeword when the latter has been transmitted. The expression for Pi(m) was 
derived in Section 7.4 and has the same form as Equations 12.3-2 and 12.3-3, with L 
being replaced by w m and yi, by ybR c w m , where w m is the weight of the mth code word 
and R c is the code rate. The product R c w m , which is not less than R c d mm , represents 
the coding gain. Thus, we have the performance of a block coded FH system with slow 
frequency hopping in broadband interference. 

The probability of error for fast frequency hopping with n? hops per coded bit is 
obtained by reinterpreting the binary event probability PoOn) in Equation 12.3-4. The 
«2 hops per coded bit may be interpreted as a repetition code, which, when combined 
with a nontrivial (n\,k) binary linear code having weight distribution j w m } , yields 
an {n\n 2 , k) binary linear code having weight distribution {niw,,,}. Hence, IMm ) has 
the form given in Equation 12.3-2, with L replaced by n 2 W m and y/, by yi, R c H 2 W m , 
where R c = k/ n \ no. Note that yi, R r n 2 W m = yi, w m k/ >i\, which is just the coding gain 
obtained from the nontrivial (n \ , k) code. Consequently, the use of the repetition code 
will result in an increase in the noncoherent combining loss. 

With hard-decision decoding and slow frequency hopping, the probability of a 
coded bit error at the output of the demodulator for noncoherent detection is 

p = \ e ~ YbRcl2 (12.3-5) 

The codeword error probability is easily upper bounded, by use of the Chernov bound, 
as 


M 

Pe < ~ P^ m/2 (12-3-6) 

m—2 

However, if fast frequency hopping is employed with no hops per coded bit, and the 
square-law-detected outputs from the corresponding matched filters for the «2 hops are 
added as in soft-decision decoding to form the two decision variables for the coded bits, 
the bit error probability p is also given by Equation 12.3-2, with L replaced by m and 
Yb replaced by y/, R c ti 2 , where R, is the rate of the nontrivial (n i , k) code. Consequently, 
the performance of the fast FH system in broadband interference is degraded relative 
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to the slow FH system by an amount equal to the noncoherent combining loss of the 
signals received from the n 2 hops. 

We have observed that for both hard-decision and soft-decision decoding, the use 
of the repetition code in a fast FH system yields no coding gain. The only coding gain 
obtained comes from the (n \ , k) block code. Hence, the repetition code is inefficient 
in a fast FH system with noncoherent combining. A more efficient coding method is 
one in which either a single low -rate binary code or a concatenated code is employed. 
Additional improvements in performance may be obtained by using nonbinary codes 
in conjunction with M- ary FSK. Bounds on the error probability for this case may be 
obtained from the results given in Section 11.1. 

Although we have evaluated the performance of linear block codes only in the 
above discussion, it is relatively easy to derive corresponding performance results for 
binary convolutional codes. We leave as an exercise for the reader the derivation of 
the bit error probability for soft-decision Viterbi decoding and hard-decision Viterbi 
decoding of FH signals corrupted by broadband interference. 

Finally, we observe that £b, the energy per bit, can be expressed as <?/, = P av / R , 
where R is the information rate in bits per second and Jq = J av /2W. Therefore, yi, 
may be expressed as 



2 W/R 


J av / P'd' 


(12.3-7) 


In this expression, we recognize W/R as the processing gain and J av /P av as the inter- 
ference margin for the FH spread spectrum signal. 


12.3-2 Performance of FH Spread Spectrum Signals 
in Partial-Band Interference 

The partial-band interference considered in this subsection is modeled as a zero-mean 
Gaussian random process with a flat power spectral density over a fraction a of the total 
bandwidth W and zero elsewhere. In the region or regions where the power spectral 
density is nonzero, its value is R zz (f) = 2 Jo/a, 0 < a < 1. This model of the 
interference may be applied to a jamming signal or to interference from other users in 
an FH CDMA system. 

Suppose that the partial-band interference comes from a jammer who may select 
a to optimize the effect on the communication system. In an uncoded pseudorandomly 
hopped (slow-hopping) FH system with binary FSK modulation and noncoherent de- 
tection, the received signal will be jammed with probability a and it will not be jammed 
with probability 1— a. When it is jammed, the probability of error is ^ exp (— £b<x/2Jo), 
and when it is not jammed, the demodulation is error-free. Consequently, the average 
probability of error is 

«») = !« exp (-f|) 

where £b/ Jo tnay also be expressed as (2W / R)/(J aY / P av ). 


(12.3-8) 
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SNR per bit, y b (dB) 


FIGURE 12.3-4 

Performance of binary FSK with partial-band 
interference. 


Figure 12.3-4 illustrates the error rate as a function of £b/Jo for several values 
of a. The jammer’s optimum strategy is to select the value of a that maximizes the error 
probability. By differentiating TMcO and solving for the extremum with the restriction 
that 0 < a < 1 , we find that 

a* = [ ~£b/2J 0 £b/J ° ~ 2 (12.3-9) 

I 1 £ h /J 0 < 2 


The corresponding error probability for the worst-case partial-band jammer is 


Pi = 


£b/ Jo 


(12.3-10) 


Whereas the error probability decreases exponentially for full-band jamming, we now 
find that the error probability decreases only inversely with £h / 7q for the worst-case 
partial-band jamming. This result is similar to the error rate performance of binary FSK 
in a Rayleigh fading channel (see Section 13.3) and to the uncoded DS spread spectrum 
system corrupted by worst-case pulse interference (see Section 12.2-3). 

As we shall demonstrate below, signal diversity obtained by means of coding 
provides a significant improvement in performance relative to uncoded signals. This 
same approach to signal design is also effective for signaling over a fading channel, as 
we shall demonstrate in Chapter 13. 

To illustrate the benefits of diversity in an FH spread spectrum signal with partial- 
band interference, we assume that the same information symbol is transmitted by binary 
FSK on L independent frequency hops. This may be accomplished by subdividing 
the signaling interval into L subintervals, as described previously for fast frequency 
hopping. After the hopping pattern is removed, the signal is demodulated by passing it 
through a pair of matched filters whose outputs are square-law-detected and sampled 
at the end of each subinterval. The square-law-detected signals corresponding to the L 
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frequency hops are weighted and summed to form the two decision variables (metrics), 
which are denoted as U\ and t/ 2 - 

When the decision variable U\ contains the signal components, U \ and U 2 may be 
expressed as 


L 

k= 1 
L 

U 2 = ^fa\N 2k \ 2 

k= 1 


(12.3-11) 


where {fa} represent the weighting coefficients, £ c is the signal energy per chip in the 
L-chip symbol, and [N j k ) represent the additive Gaussian noise terms at the output of 
the matched filters. 

The coefficients are optimally selected to prevent the interference from saturating 
the combiner should the transmitted frequencies be successfully hit in one or more hops. 
Ideally, fa is selected to be equal to the reciprocal of the variance of the corresponding 
noise terms { A^*}. Thus, the noise variance for each chip is normalized to unity by 
this weighting and the corresponding signal is also scaled accordingly. This means that 
when the signal frequencies on a particular hop are interfered, the corresponding weight 
is very small. In the absence of interference on a given hop, the weight is relatively 
large. In practice, for partial-band interference, the weighting may be accomplished 
by use of an AGC having a gain that is set on the basis of noise power measurements 
obtained from frequency bands adjacent to the transmitted tones. This is equivalent to 
having side information (knowledge of jammer state) at the decoder. 

Suppose that we have broadband Gaussian noise with power spectral density No 
and partial-band interference, over a IT of the frequency band, which is also Gaussian 
with power spectral density Jo/ a. In the presence of partial-band interference, the 
variance of the real and imaginary parts of the noise terms N\ k and N 2 k are 


cr l = ±E(\N lk \ 2 ) = \E(\N 2k \ 2 ) = 2£ c (Vo + ^ 


(12.3-12) 


In this case, we select fa = \/o 2 = \2£ c (No + Jo/ cc)} 1 • In the absence of partial- 
band interference, a 2 = 2£ c No and, hence, fa = (2£ r No'f 1 . Note that fa is a random 
variable. It is convenient to normalize the variance of the noise components to unity by 
defining, N[ k = ~faN\ k and N' lk = \/~faN 2k , where fa = 1 /a 2 for the corresponding 
values of nf. 

An error occurs in the demodulation ifU 2 > U\ . Although it is possible to determine 
the exact error probability, we shall resort to the Chernov bound, which yields a result 
that is much easier to evaluate and interpret. Specifically, the Chernov (upper) bound 
on the error probability is 


P 2 = P(U 2 - t/i > 0) < £{exp[v((/ 2 - tA)]} 


= E 



L 

-vJ2i\2Vfa£c + N [ k \ 2 


k= 1 



(12.3-13) 


where v > 0 is a variable that is optimized to yield the tightest possible bound. 
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The averaging in Equation 12.3-13 is performed with respect to the statistics of 
the noise components and the statistics of the weighting coefficients { /S/t } , which are 
random as a consequence of the statistical nature of the interference. Keeping the {fa} 
fixed and averaging over the noise statistics first, we obtain 


Piifa < E 


ex P ( ~ v \ 2 \fPk£c + N[ k \ 2 + v \N' lk Y 


k= 1 


k=\ 


= Y[E[exp(-v\2 v ffa£ c + N[ k \ 2 )]E[exp (v| Af^l 2 )] 
'-4£*fav\ 


k=l 

L 

n T 

k= i 


i 


— 4u 2 CXP l 1 + 2v 


(12.3-14) 


Since the FSK tones are interfered with probability a, it follows that fa = [2 £(Nq + 
■A)/ 0 0] 1 with probability a and (2£ c No) ~ 1 with probability 1 — a. Hence, the Chernov 
bound is 


p ^n( T 

k= i *- 


a 


1 — 4v 2 


— 4v 2 
exp 


exp 


—2£ r v 


(iVo + /o/a)(l + 2 v)J 

—2£ c v 


1 — a 
+ 1 — 4v 2 

1 — a 

exp 


exp 


—2£ r v 


[{N 0 + Jo/a)(\+2v) 1 — 4v 2 


AW+2v)J 
-2£ c v n ' L 


N 0 (l + 2v). 


(12.3-15) 


The next step is to optimize the bound in Equation 12.3-15 with respect to the 
variable v. In its present form, however, the bound is messy to manipulate. A significant 
simplification occurs if we assume that fa/ a, > No, which renders the second term in 
Equation 12.3-15 negligible compared with the first. Alternatively, we let No = 0, so 
that the bound on Ft reduces to 


Pi < 


a 

1 — 4v 2 


exp 


— 2av£ c 
_fa(\+2v) 


(12.3-16) 


The minimum value of this bound with respect to v and the maximum with respect to a 
(worst-case partial-band interference) is easily shown to occur when a = 3fa/£ c < 1 
and v — 4 . For these values of the parameters, Equation 12.3-16 reduces to 


Pi < Pi(L) = 




£c _ £b_ 

fa EJo 


(12.3-17) 


where y c is the SNR per chip in the L-chip symbol. 

The result in Equation 12.3-17 was first derived by Viterbi and Jacobs (1975). 

We observe that the probability of error for the worst-case partial -band interference 
decreases exponentially with an increase in the SNR per chip y c . This result is very 
similar to the performance characteristics of diversity techniques for Rayleigh fading 
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FIGURE 12.3-5 

Graph of the function h(y c ). 
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channels (see Section 13.3). We may express the right-hand side of Equation 12.3-17 
in the form 

P 2 (L) = exp [~y b h(y c )] (12.3-18) 


where the function h(y c ) is defined as 


Hy c ) = 




(12.3-19) 


A plot of h{y c ) is given in Figure 12.3-5. We observe that the function has a maximum 
value of | at y c = 4. Consequently, there is an optimum SNR per chip of 10 log y c = 
6 dB. At the optimum SNR, the error rate is upper-bounded as 

Pi < PiiLopt) = e~ n/4 (12.3-20) 


When we compare the error probability bound in Equation 12.3-20 with the 
error probability for binary FSK in spectrally flat noise, which is given by Equa- 
tion 12.3-1, we see that the combined effect of worst-case partial-band interference 
and the noncoherent combining loss in the square-law combining of the L chips is 3 dB. 
We emphasize, however, that for a given 8b/ Jo, the loss is greater when the order of 
diversity is not optimally selected. 

Coding provides a means for improving the performance of the FH system cor- 
rupted by partial-band interference. In particular, if a block orthogonal code is used, 
with M = 2 k codewords and Eth-order diversity per codeword, the probability of a 
codeword error is upper-bounded as 

/ 1.47\ L , ( 1.47 \ L 

Pe < (2 k - 1 )P 2 (L) = (2 k - 1) J = (2 a - 1) (12.3-21) 

and the equivalent bit error probability is upper-bounded as 


Pb < 2 A_1 



L 


ky b /L 


(12.3-22) 
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SNR per bit, y b (dB) 


FIGURE 12.3-6 

Performance of binary and octal FSK with T-order diversity for a channel with worst-case 
partial-band interference. 


Figure 12.3-6 illustrates the probability of a bit error for L = 1 , 2, 4, 8 and k = 1,3. 
With an optimum choice of diversity, the upper bound can be expressed as 

P b < 2 k ~ l exp (-\ky b ) = \ exp [-k(\yb ~ In 2)] (12.3-23) 

Thus, we have an improvement in performance by an amount equal to 1 0 log [AT I — 
2.11 /yb)]. For example, if y b = 10 and k = 3 (octal modulation), then the gain is 
3.4 dB, while if k = 5, then the gain is 5.6 dB. 

Additional gains can be achieved by employing concatenated codes in conjunction 
with soft-decision decoding. In the example below, we employ a dual-/t convolutional 
code as the outer code and a Hadamard code as the inner code on the channel with 
partial-band interference. 
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example 12.3-1. Suppose we use aHadamard H(n, A) constant weight code with on- 
off keying (OOK) modulation for each code bit. The minimum distance of the code is 
d mm = \n, and, hence, the effective order of diversity obtained with OOK modulation 
is = \n. There are \n FH tones transmitted per code word. Hence, 

Yc = — Yb = 2R c Yb (12.3-24) 

2 n 

when this code is used alone. The bit error rate performance for soft-decision decoding 
of these codes for the partial-band interference channel is upper-bounded as 

/ 1 47 \ "/ 4 

Pb < 2 k ~ l P 2 ^d min ) = 2 k ~ l (12.3-25) 

\2R c y b J 

Now, if a Hadamard (n, k ) code is used as the inner code and a rate 1/2 dual-A 
convolutional code (see Section 8.7) is the outer code, the bit error performance in the 
presence of worst-case partial-band interference is (see Equation 8.7-5) 

2*-t 00 2 k ~ 1 00 

Pb < 2k _ | y: PmP 2 (\mdrmn) = 2k _ | ^ p m P 2 (\mn) (12.3-26) 

m = 4 m= 4 

where P 2 (L) is given by Equation 12.3-17 with 

k 

Yc = ~Yb = RcYb (12.3-27) 

n 

Figure 12.3-7 illustrates the performance of the dual-A' codes for k = 5, 4, and 3 
concatenated with the Hadamard 7/(20, 5), //( 16, 4), and H{ 12, 3) codes, respectively. 

In the above discussion, we have focused on soft-decision decoding. On the other 
hand, the performance achieved with hard-decision decoding is significantly (several 
decibels) poorer than that obtained with soft-decision decoding. In a concatenated 



SNR per bit, y b (dB) 


FIGURE 12.3-7 

Performance of dual-A codes concatenated with 
Hadamard codes for a channel with worst-case 
partial-band interference. 
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coding scheme, however, a mixture involving soft decision decoding of the inner code 
and hard decision decoding of the outer code represents a reasonable compromise 
between decoding complexity and performance. 

Finally, we wish to indicate that another serious threat in an FH spread spectrum 
system is partial-band multitone interference. This type of interference is similar in ef- 
fect to partial-band spectrally flat noise interference. Diversity obtained through coding 
is an effective means for improving the performance of the FFI system. An additional 
improvement is achieved by properly weighting the demodulator outputs so as to sup- 
press the effects of the interference. 

12.3-3 A CDMA System Based on FH Spread Spectrum Signals 

In Section 12.2-2, we considered a CDMA system based on the use of DS spread 
spectrum signals. As previously indicated, it is also possible to have a CDMA system 
based on FH spread spectrum signals. Each transmitter-receiver pair in such a system 
is assigned its own pseudorandom FH pattern. Aside from this distinguishing feature, 
the transmitters and receivers of all the users may be identical in that they may have 
identical encoders, decoders, modulators, and demodulators. 

CDMA systems based on FH spread spectrum signals are particularly attractive 
for mobile (land, air, sea) users because timing requirements are not as stringent as in a 
DS spread spectrum signal. In addition, frequency synthesis techniques and associated 
hardware have been developed that make it possible to frequency-hop over bandwidths 
that are significantly larger than those currently possible with DS spread spectrum 
systems. Consequently, larger processing gains are possible with FH. The capacity of 
CDMA with FH is also relatively high. Viterbi (1978) has shown that with dual -A: codes 
and M - ary FSK modulation, it is possible to accommodate up to | W/R simultaneous 
users who transmit at an information rate R bits/s over a channel with bandwidth W. 

One of the earliest CDMA systems based on FH coded spread spectrum signals 
was built to provide multiple-access tactical satellite communications for small mobile 
(land, sea, air) terminals each of which transmitted relatively short messages over the 
channel intermittently. The system was called the Tactical Transmission System (TATS), 
and it is described in a paper by Drouilhet and Bernstein (1969). 

An octal Reed-Solomon (7, 2) code is used in the TATS system. Thus, two 3-bit 
information symbols from the input to the encoder are used to generate a seven-symbol 
code word. Each 3-bit coded symbol is transmitted by means of octal FSK modulation. 
The eight possible frequencies are spaced 1 / T c Hz apart, where T c is the time (chip) 
duration of a single frequency transmission. In addition to the seven symbols in a code 
word, an eighth symbol is included. That symbol and its corresponding frequency are 
fixed and transmitted at the beginning of each code word for the purpose of providing 
timing and frequency synchronization^ at the receiver. Consequently, each code word 
is transmitted in 8 T c seconds. 


tSince mobile users are involved, there is a Doppler frequency offset associated with transmission. This 
frequency offset must be tracked and compensated for in the demodulation of the signal. The sync symbol 
is used for this purpose. 
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TATS was designed to transmit at information rates of 75 and 2400 bits/s. Hence, 
T c = 10 ms and 312.5 /xs, respectively. Each frequency tone corresponding to a code 
symbol is frequency-hopped. Hence, the hopping rate is 100 hops/s at the 75-bits/s rate 
and 3200 hops/s at the 2400-bits/s rate. 

There are M = 2 6 = 64 code words in the Reed-Solomon (7, 2) code and the 
minimum distance of the code is d m i n = 6. This means that the code provides an 
effective order of diversity equal to 6. 

At the receiver, the received signal is first dehopped and then demodulated by 
passing it through a parallel bank of eight matched filters, where each filter is tuned to 
one of the eight possible frequencies. Each filter output is envelope-detected, quantized 
to 4 bits (one of 16 levels), and fed to the decoder. The decoder takes the 56 filter 
outputs corresponding to the reception of each seven-symbol code word and forms 64 
decision variables corresponding to the 64 possible code words in the (7, 2) code by 
linearly combining the appropriate envelope-detected outputs. A decision is made in 
favor of the code word having the largest decision variable. 

By limiting the matched filter outputs to 16 levels, interference (crosstalk) from 
other users of the channel causes a relatively small loss in performance (0.75 dB with 
strong interference on one chip and 1 .5 dB with strong interference on two chips out of 
the seven). The AGC used in TATS has a time constant greater than the chip interval T c , 
so that no attempt is made to perform optimum weighting of the demodulator outputs 
as described in Section 12.3-2. 

The derivation of the error probability for the TATS signal in AWGN and worst- 
case partial-band interference is left as an exercise for the reader (Problems 12.23 
and 12.24). 


■ 12.4 

OTHER TYPES OF SPREAD SPECTRUM SIGNALS 

DS and FH are the most common forms of spread spectrum signals used in practice. 
However, other methods may be used to introduce pseudorandomness in a spread 
spectrum signal. One method, which is analogous to FH. is time hopping (TH). In TH, 
a time interval, which is selected to be much larger than the reciprocal of the information 
rate, is subdivided into a large number of time slots. The coded information symbols are 
transmitted in a pseudorandomly selected time slot as a block of one or more codewords. 
PSK modulation may be used to transmit the coded bits. 

For example, suppose that a time interval T is subdivided into 1000 time slots of 
width T /1000 each. With an information bit rate of R bits/s, the number of bits to be 
transmitted in T seconds is RT . Coding increases this number to RT / R c bits, where R, 
is the code rate. Consequently, in a time interval of T /1000s, we must transmit RT / R c 
bits. If binary PSK is used as the modulation method, the bit rate is 1000 R/R c and the 
bandwidth required is approximately W = 1000 R/R c . 

A block diagram of a transmitter and a receiver for a TH spread spectrum system 
is shown in Figure 12.4-1. Because of the burst characteristics of the transmitted 
signal, buffer storage must be provided at the transmitter in a TH system, as shown in 
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Output 


FIGURE 12.4-1 

Block diagram of time-hopping (TH) spread spectrum system. 


Figure 12.4-1. A buffer may also be used at the receiver to provide a uniform data 
stream to the user. 

Just as partial-band interference degrades an uncoded FH spread spectrum system, 
partial-time (pulsed) interference has a similar effect on a TH spread spectrum system. 
Coding and interleaving are effective means for combating this type of interference, as 
we have already demonstrated for FH and DS systems. Perhaps the major disadvantage 
of a TH system is the stringent timing requirements compared not only with FH but, 
also, with DS. 

Other types of spread spectrum signals can be obtained by combining DS, FH, and 
TH. For example, we may have a hybrid DS/FH, which means that a PN sequence is 
used in combination with frequency hopping. The signal transmitted on a single hop 
consists of a DS spread spectrum signal which is demodulated coherently. However, 
the received signals from different hops are combined noncoherently (envelope or 
square-law combining). Since coherent detection is performed within a hop, there is an 
advantage obtained relative to a pure FH system. However, the price paid for the gain 
in performance is an increase in complexity, greater cost, and more stringent timing 
requirements. 

Another possible hybrid spread spectrum signal is DS/ TH. This does not seem to 
be as practical as DS/FH, primarily because of an increase in system complexity and 
more stringent timing requirements. 


■ 12.5 

SYNCHRONIZATION OF SPREAD SPECTRUM SYSTEMS 

Time synchronization of the receiver to the received spread spectrum signal may be 
separated into two phases. There is an initial acquisition phase and a tracking phase 
after the signal has been initially acquired. 
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Acquisition In a direct sequence spread spectrum system, the PN code must be 
time- synchronized to within a small fraction of the chip interval T c ~ 1 /W. The prob- 
lem of initial synchronization may be viewed as one in which we attempt to synchronize 
in time the receiver clock to the transmitter clock. Usually, extremely accurate and stable 
time clocks are used in spread spectrum systems. Consequently, accurate time clocks 
result in a reduction of the time uncertainty between the receiver and the transmitter. 
However, there is always an initial timing uncertainty due to range uncertainty between 
the transmitter and the receiver. This is especially a problem when communication is 
taking place between two mobile users. In any case, the usual procedure for establish- 
ing initial synchronization is for the transmitter to send a known pseudorandom data 
sequence to the receiver. The receiver is continuously in a search mode looking for this 
sequence in order to establish initial synchronization. 

Let us suppose that the initial timing uncertainty is T u and the chip duration is T c . 
If initial synchronization is to take place in the presence of additive noise and other 
interference, it is necessary to dwell for 7)/ = NT C in order to test synchronism at each 
time instant. If we search over the time uncertainty interval in (coarse) time steps of 
\T C , then the time required to establish initial synchronization is 

Tinit sync = ~^NT C = 2 NT U (12.5-1) 

2 >c 

Clearly, the synchronization sequence transmitted to the receiver must be at least as 
long as 2 NT U in order for the receiver to have sufficient time to perform the necessary 
search in a serial fashion. 

In principle, matched filtering or cross correlation are optimum methods for estab- 
lishing initial synchronization. A filter matched to the known data waveform generated 
from the known pseudorandom sequence continuously looks for exceedence of a pre- 
determined threshold. When this occurs, initial synchronization is established and the 
demodulator enters the “data receive” mode. 

Alternatively, we may use a sliding correlator as shown in Figure 12.5-1. The 
correlator cycles through the time uncertainty, usually in discrete time intervals of \ T c , 
and correlates the received signal with the known synchronization sequence. The cross 
correlation is performed over the time interval NT C (N chips) and the correlator output 
is compared with a threshold to determine if the known signal sequence is present. If 
the threshold is not exceeded, the known reference sequence is advanced in time by 



Sync. 

pulse 


FIGURE 12.5-1 

A sliding correlator for DS signal acquisition. 
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\ T c seconds and the correlation process is repeated. These operations are performed 
until a signal is detected or until the search has been performed over the time uncertainty 
interval T u . In the latter case, the search process is then repeated. 

A similar process may also be used for FH signals. In this case, the problem is to 
synchronize the PN code that controls the hopped frequency pattern. To accomplish 
this initial synchronization, a known FH signal is transmitted to the receiver. The initial 
acquisition system at the receiver looks for this known FH signal pattern. For example, 
a bank of matched filters tuned to the transmitted frequencies in the known pattern 
may be employed. Their outputs must be properly delayed, envelope- or square-law- 
detected, weighted, if necessary, and added (noncoherent integration) to produce the 
signal output which is compared with a threshold. A signal present is declared when 
the threshold is exceeded. The search process is usually performed continuously in time 
until a threshold is exceeded. A block diagram illustrating this signal acquisition scheme 
is given in Figure 12.5-2. As an alternative, a single matched-filter-envelope detector 
pair may be used, preceded by an FH pattern generator and followed by a postdetection 
integrator and a threshold detector. This configuration, shown in Figure 12.5-3, is based 
on a serial search and is akin to the sliding correlator for DS spread spectrum signals. 

The sliding correlator for the DS signals or its counterpart shown in Figure 12.5-3 
for FH signals basically perform a serial search that is generally time-consuming. As 
an alternative, one may introduce some degree of parallelism by having two or more 
such correlators operating in parallel and searching over non-overlapping time slots. 
In such a case, the search time is reduced at the expense of a more complex and costly 
implementation. 



FIGURE 12.5-2 

System for acquisition of an FH signal. 
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FIGURE 12.5-3 

Alternative system for acquisition of an FH signal. 

During the search mode, there may be false alarms that occur at the designed false 
alarm rate of the system. To handle the occasional false alarms, it is necessary to have 
an additional method or circuit that checks to confirm that the received signal at the 
output of the correlator remains above the threshold. With such a detection strategy, a 
large noise pulse that causes a false alarm will cause only a temporary exceedence of 
the threshold. On the other hand, when a signal is present, the correlator or matched 
filter output will stay above the threshold for the duration of the transmitted signal. 
Thus, if confirmation fails, the search is resumed. 

Another initial search strategy, called a sequential search, has been investigated by 
Ward (1965) and Ward and Yiu (1977). In this method, the dwell time at each delay in 
the search process is made variable by employing a correlator with a variable integration 
period whose (biased) output is compared with two thresholds. Thus, there are three 
possible decisions: 

1 . If the upper threshold is exceeded by the correlator output, initial synchronization 
is declared established. 

2. If the correlator output falls below the lower threshold, the signal is declared absent 
at that delay and the search process resumes at a different delay. 

3. If the correlator output falls between the two thresholds, the integration time is 
increased by one chip and the resulting output is compared with the two thresholds 
again. 

Hence, steps 1 , 2, and 3 are repeated for each chip interval until the correlator output 
either exceeds the upper threshold or falls below the lower threshold. 
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The sequential search method falls in the class of sequential estimation methods 
proposed by Wald (1947), which are known to result in a more efficient search in the 
sense that the average search time is minimized. Hence, the search time for a sequential 
search is less than that for the fixed dwell time integrator. 

In the above discussion, we have considered only time uncertainty in establishing 
initial synchronization. However, another aspect of initial synchronization is frequency 
uncertainty. If the transmitter and/or the receiver are mobile, the relative velocity be- 
tween them results in a Doppler frequency shift in the received signal relative to the 
transmitted signal. Since the receiver does not usually know the relative velocity, a 
priori, the Doppler frequency shift is unknown and must be determined by means of 
a frequency search method. Such a search is usually accomplished in parallel over 
a suitably quantized frequency uncertainty interval and serially over the time uncer- 
tainty interval. A block diagram of this scheme is shown in Figure 12.5^4. Appropriate 
Doppler frequency search methods can also be devised for FH signals. 

Tracking Once the signal is acquired, the initial search process is stopped and fine 
synchronization and tracking begins. The tracking maintains the PN code generator at 
the receiver in synchronism with the incoming signal. Tracking includes both fine chip 
synchronization and, for coherent demodulation, carrier phase tracking. 

The commonly used tracking loop for a DS spread spectrum signal is the delay- 
locked loop (DLL) which is shown in Figure 12.5-5. In this tracking loop, the received 
signal is applied to two multipliers, where it is multiplied by two outputs from the local 
PN code generator, which are delayed relative to each other by an amount 28 < T c . 



FIGURE 12.5-4 

Initial search for Doppler frequency offset in a DS system. 
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FIGURE 12.5-5 

Delay-locked loop (DLL) for PN code tracking. 

Thus, the product signals are the cross correlations between the received signal and 
the PN sequence at the two values of delay. These products are band-pass-iiltered 
and envelope- (or square-law-) detected and then subtracted. This difference signal 
is applied to the loop filter that drives the voltage-controlled clock (VCC). The VCC 
serves as the clock for the PN code signal generator. 

If the synchronism is not exact, the filtered output from one correlator will exceed 
the other and the VCC will be appropriately advanced or delayed. At the equilibrium 
point, the two filtered correlator outputs will be equally displaced from the peak value, 
and the PN code generator output will be exactly synchronized to the received signal that 
is fed to the demodulator. We observe that this implementation of the DLL for tracking 
a DS signal is equivalent to the early-late gate bit tracking synchronizer previously 
discussed in Section 5.3-2 and shown in Figure 5.3-5. 

An alternative method for time tracking a DS signal is to use a tau-ditlier loop 
(TDL), illustrated by the block diagram in Figure 12.5-6. The TDL employs a single 



FIGURE 12.5-6 

Tau-dither loop (TDL). 
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“arm” instead of the two “arms” shown in Figure 12.5-5. By providing a suitable 
gating waveform, it is possible to make this “single-arm” implementation appear to be 
equivalent to the “two-arm” realization. In this case, the cross correlation is regularly 
sampled at two values of delay, by stepping the code clock forward or backward in 
time by an amount 5. The envelope of the cross correlation that is sampled at ±5 has 
an amplitude modulation whose phase relative to the tau-dither modulator determines 
the sign of the tracking error. 

A major advantage of the TDL is the less costly implementation resulting from 
elimination of one of the two arms that are employed in the conventional DLL. A 
second and less apparent advantage is that the TDL does not suffer from performance 
degradation that is inherent in the DLL when the amplitude gain in the two arms is not 
properly balanced. 

The DLL (and its equivalent, the TDL) generate an error signal by sampling the 
signal correlation function at ±<5 off the peak as shown in Figure 12.5-7a. This generates 
an error signal as shown in Figure 12.5-7b. The analysis of the performance of the DLL 
is similar to that for the phase-locked loop (PLL) carried out in Section 5.2. If it were 
not for the envelope detectors in the two arms of the DLL. the loop would resemble 
a Costas loop. In general, the variance of the time estimation error in the DLL is 
inversely proportional to the loop SNR, which depends on the input SNR to the loop 
and the loop bandwidth. Its performance is somewhat degraded as in the squaring PLL 
by non-linearities inherent in the envelope detectors, but this degradation is relatively 
small. 

A typical tracking technique for FH spread spectrum signals is illustrated in Fig- 
ure 12.5-8a. This method is also based on the premise that, although initial acquisition 
has been achieved, there is a small timing error between the received signal and the 
receiver clock. The band pass filter is tuned to a single intermediate frequency and its 
bandwidth is of the order of 1 / T c , where T c is the chip interval. Its output is envelope- 
detected and then multiplied by the clock signal to produce a three-level signal, as shown 
in Figure 12.5-8b, which drives the loop filter. Note that when the chip transitions 
from the locally generated sinusoidal waveform do not occur at the same time as the 






FIGURE 12.5-7 

Autocorrelation function and tracking error signal for DLL. 


822 


Digital Communications 



(a) Tracking loop for FH signals 
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(b) Wavefront for tracking an FH signal 


FIGURE 12.5-8 

Tracking method for FH signals. [From Pickholtz et al. (1982). © 1982 IEEE.) 


transitions in the incoming signal, the output of the loop filter will be either negative or 
positive, depending on whether the VCC is lagging or advanced relative to the timing 
of the input signal. This error signal from the loop filter will provide the control signal 
for adjusting the VCC timing signal so as to drive the frequency synthesized pulsed 
sinusoid to proper synchronism with the received signal. 
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■ 12.6 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

The introductory treatment of spread spectrum signals and their performance that we 
have given in this chapter is necessarily brief. Detailed and more specialized treat- 
ments of signal acquisition techniques, code tracking methods, and hybrid spread 
spectrum systems, as well as other general topics on spread spectrum signals and 
systems, can be found in the vast body of technical literature that now exists on the 
subject. 

Historically, the primary application of spread spectrum communications has been 
in the development of secure (AJ) digital communication systems for military use. 
In fact, prior to 1970, most of the work on the design and development of spread 
spectrum communications was classified. Since then, this trend has been reversed. The 
open literature now contains numerous publications on all aspects of spread spectrum 
signal analysis and design. Moreover, we have recently seen the application of spread 
spectrum signaling techniques to commercial communications such as interoffice radio 
communications (see Pahlavan, 1985), mobile radio communications (see Yue, 1983), 
and digital cellular communications (see Viterbi, 1995). 

A historical perspective on the development of spread spectrum communication 
systems covering the period 1920-1960 is given in a paper by Scholtz (1982). 
Tutorial treatments focusing on the basic concepts are found in the papers by Scholtz 
(1977) and Pickholtz et al. (1982). These papers also contain a large number of ref- 
erences to previous work. In addition, there are two papers by Viterbi (1979, 1985) 
that provide a basic review of the performance characteristics of DS and FH signaling 
techniques. 

Comprehensive treatments of various aspects of analysis and design of spread 
spectrum signals and systems, including synchronization techniques are now available 
in the texts by Simon et al. (1985) Peterson et al. (1995), and Holmes (1982). In 
addition to these texts, there are several special issues of the IEEE Transactions on 
Communications devoted to spread spectrum communications (August 1977 and May 
1982) and the IEEE Transactions on Selected Areas in Communication (September 
1985, May 1989, May 1990, and June 1993). These issues contain a collection of papers 
devoted to a variety of topics, including multiple-access techniques, synchronization 
techniques, and performance analyses with various types of interference. A number of 
important papers that have been published in IEEE journals have also been reprinted in 
book form by the IEEE Press (Dixon, 1976; Cook et al., 1983). Finally, we recommend 
the book by Golomb (1967) as a basic reference on shift register sequences for the 
reader who wishes to delve deeper into this topic. 


PROBLEMS 


12.1 Following the procedure outlined in Example 12.2-2, determine the error rate perfor- 
mance of a DS spread spectrum system in the presence of CW jamming when the signal 
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12.2 The sketch in Figure PI 2.2 illustrates the power spectral densities of a PN spread spec- 
trum signal and narrowband interference in an uncoded (trivial repetition code) digital 
communication system. Referring to Figure 12.2-6, which shows the demodulator for 
this signal, sketch the (approximate) spectral characteristics of the signal and the inter- 
ference after the multiplication of r(t) with the output of the PN generator. Determine 
the fraction of the total interference that appears at the output of the correlator when the 
number of PN chips per bit is L c . 
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12.3 Consider the concatenation of a Reed-Solomon (3 1 , 3) (q = 32-ary alphabet) as the outer 
code with a Hadamard (16, 5) binary code as the inner code in a DS spread spectrum 
system. Assume that soft-decision decoding is performed on both codes. Determine an 
upper (union) bound on the probability of a bit error based on the minimum distance of 
the concatenated code. 

12.4 The Hadamard (n , k) = (2 m , m + 1) codes are low-rate codes with rf m j n = 2" _I . Determine 
the performance of this class of codes for DS spread spectrum signals with binary PSK 
modulation and either soft-decision or hard-decision decoding. 

12.5 A rate 1 /2 convolutional code with df Tee = 10 is used to encode a data sequence occurring 
at a rate of 1000 bits/s. The modulation is binary PSK. The DS spread spectrum sequence 
has a chip rate of 10 MHz. 

a. Determine the coding gain. 

b. Determine the processing gain. 

c. Determine the interference margin assuming an Sb/ J q = 10. 

12.6 A total of 30 equal-power users are to share a common communication channel by CDMA. 
Each user transmits information at a rate of 10 kbits/s via DS spread spectrum and binary 
PSK. Determine the minimum chip rate to obtain a bit error probability of 1 0 -5 . Additive 
noise at the receiver may be ignored in this computation. 

12.7 A CDMA system is designed based on DS spread spectrum with a processing gain of 
1000 and binary PSK modulation. Determine the number of users if each user has equal 
power and the desired level of performance is an error probability of 10 -6 . Repeat the 
computation if the processing gain is changed to 500. 
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12.8 A DS spread spectrum system transmits at a rate of 1000 bits/s in the presence of a tone 
jammer. The jammer power is 20 dB greater than the desired signal, and the required 
£b/Jo to achieve satisfactory performance is 10 dB. 

a. Determine the spreading bandwidth required to meet the specifications. 

b. If the jammer is a pulse jammer, determine the pulse duty cycle that results in worst- 
case jamming and the corresponding probability of error. 

12.9 A CDMA system consists of 15 equal-power users that transmit information at a rate of 
10,000 bits/s, each using a DS spread spectrum signal operating at a chip rate of 1 MHz. 
The modulation is binary PSK. 

a. Determine the £b/ Jo, where Jq is the spectral density of the combined interference. 

b. What is the processing gain? 

c. How much should the processing gain be increased to allow for doubling the number 
of users without affecting the output SNR? 

12.10 A DS binary PSK spread spectrum signal has a processing gain of 500. What is the 
interference margin against a continuous-tone interference if the desired error probability 
is 10“ 5 ? 

12.11 Repeat Problem 12.10 if the interference consists of pulsed noise with a duty cycle of 
1 percent. 

12.12 Consider the DS spread spectrum signal 

OO 

c(0 = ^2 c„p{t-nT c ) 

n =— oo 

where c„ is a periodic m sequence with a period N = 127 and p(t) is a rectangular pulse 
of duration T c = 1 /zs. Determine the power spectral density of the signal c(t). 

12.13 Suppose that {ci, } and {c 2 ;} are two binary (0, 1) periodic sequences with periods N i and 
W, respectively. Determine the period of the sequence obtained by forming the modulo-2 
sum of {ci, } and {co;}. 

12.14 An m = 10 maximum-length shift register is used to generate the pseudorandom sequence 
in a DS spread spectrum system. The chip duration is T c = I /zs, and the bit duration is 
7j, = NT C , where N is the length (period) of the m sequence. 

a. Determine the processing gain of the system in dB. 

b. Determine the interference margin if the required £b/ Jq = 10 and the jammer is a 
tone jammer with an average power 7 av . 

12.15 An FH binary orthogonal FSK system employs an m = 15 stage linear feedback shift 
register that generates a maximum-length sequence. Each state of the shift register selects 
one of L non-overlapping frequency bands in the hopping pattern. The bit rate is 100 bits/s 
and the hop rate is one hop per bit. The demodulator employs noncoherent detection. 

a. Determine the hopping bandwidth for this channel. 

b. What is the processing gain? 

c. What is the probability of error in the presence of AWGN? 
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12.16 Consider the FH binary orthogonal FSK system described in Problem 12.15. Suppose 
that the hop rate is increased to 2 hops/bit. The receiver uses square-law combining to 
combine the signal over the two hops. 

a. Determine the hopping bandwidth for the channel. 

b. What is the processing gain? 

c. What is the error probability in the presence of AWGN? 

12.17 In a fast FH spread spectrum system, the information is transmitted via FSK, with non- 
coherent detection. Suppose there are N = 3 hops/bit, with hard-decision decoding of 
the signal in each hop. 

a. Determine the probability of error for this system in an AWGN channel with power 
spectral density | (Vo and an SNR = 13 dB (total SNR over the three hops). 

b. Compare the result in (a) with the error probability of an FH spread spectrum system 
that hops once per bit. 

12.18 A slow FH binary FSK system with noncoherent detection operates at £b/ Jo = 10, with 
a hopping bandwidth of 2 GHz, and a bit rate of 10 kbits/s. 

a. What is the processing gain for the system? 

b. If the jammer operates as a partial-band jammer, what is the bandwidth occupancy for 
worst-case jamming? 

c. What is the probability of error for the worst-case partial-band jammer? 

12.19 Determine the error probability for an FH spread spectrum signal in which a binary 
convolutional code is used in combination with binary FSK. The interference on the 
channel is AWGN. The FSK demodulator outputs are square-law-detected and passed 
to the decoder, which performs optimum soft-decision Viterbi decoding as described in 
Chapter 8. Assume that the hopping rate is 1 hop per coded bit. 

12.20 Repeat Problem 12.19 for hard-decision Viterbi decoding. 

12.21 Repeat Problem 12.19 when fast frequency hopping is performed at a hopping rate 
of L hops per coded bit. 

12.22 Repeat Problem 12.19 when fast frequency hopping is performed with L hops per coded 
bit and the decoder is a hard-decision Viterbi decoder. The L chips per coded bit are 
square-law-detected and combined prior to the hard decision. 

12.23 The TATS signal described in Section 12.3-3 is demodulated by a parallel bank of eight 
matched filters (octal FSK), and each filter output is square-law-detected. The eight 
outputs obtained in each of seven signal intervals (56 total outputs) are used to form the 
64 possible decision variables corresponding to the Reed-Solomon (7, 2) code. Determine 
an upper (union) bound of the code word error probability for AWGN and soft-decision 
decoding. 

12.24 Repeat Problem 12.23 for the worst-case partial-band interference channel. 

12.25 Derive the results in Equations 12.2-50 and 12.2-51 from Equation 12.2-49. 

12.26 Show that Equation 12.3-14 follows from Equation 12.3-13. 
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12.27 Derive Equation 12.3-17 from Equation 12.3-16. 

12.28 The parity polynomials for constructing Gold code sequences of length n = 7 are 

hi(X) = X 3 + X + 1 
h 2 (X) = X 3 + X 2 + 1 

Generate all the Gold codes of length 7 and determine the cross correlations of one 
sequence with each of the others. 

12.29 In Section 12.2-3, we demonstrated techniques for evaluating the error probability of a 
coded system with interleaving in pulse interference by using the cutoff rate parameter Rq. 
Use the error probability curves given in Figure P12. 29 for rate 1 / 2 and 1 /3 convolutional 
codes with soft-decision Viterbi decoding to determine the corresponding error rates for 
a coded system in pulse interference. Perform this computation for K = 3, 5, and 7. 



FIGURE P12.29 


12.30 In coded and interleaved DS binary PSK modulation with pulse jamming and soft-decision 
decoding, the cutoff rate is 


-a£ c /N 0 ^ 


Ro = 1 - log 2 (l + ae 
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FIGURE P12.29 

(Continued) 


where a is the fraction of the time the system is being jammed, E c = EbR, R is the bit 
rate, and No = Jo- 

a. Show that the SNR per bit, Eb /No, can be expressed as 

No ~ aR 11 2 1_fi ° - 1 

b. Determine the value of a that maximizes the required Eb/No (worst-case pulse jam- 
ming) and the resulting maximum value of Eb/No- 

c. Plot the graph of 10 log(Eb/rNo ) versus Ro, where r = Rq/R, for worst-case pulse 
jamming and for AWGN (a = 1). What conclusions do you reach regarding the effect 
of worst-case pulse jamming? 


12.31 In a coded and interleaved FH c/-ary FSK modulation with partial band jamming and 
coherent demodulation with soft-decision decoding, the cutoff rate is 


R 0 = log 2 


1 + (q 




l)o;e _ “^ c/,2iVo 


where ce is the fraction of the band being jammed, E c is the chip (or tone) energy, and 
N 0 = Jo- 


Chapter Twelve: Spread Spectrum Signals for Digital Communications 


829 


a. Show that the SNR per bit can be expressed as 

£b_ _ _ 2 _ Jn (q - l)ot 
N 0 ~ aR “ q2~ R o - 1 

b. Determine the value of a that maximizes the required Sb /No (worst-case partial band 
jamming) and the resulting maximum value of £b/No- 

c. Definer = Rq/R in the result for £b/No from (b), and plot 10 log(£b/rNo) versus the 
normalized cutoff rate Rq/ log 2 q for q = 2,4, 8, 16, 32. Compare these graphs with 
the results of Problem 12.30c. What conclusions do you reach regarding the effect of 
worst-case partial band jamming? What is the effect of increasing the alphabet size ql 
What is the penalty in SNR between the results in Problem 12.30c and q - ary FSK 
as q —>■ oo? 



Fading Channels I: Characterization and Signaling 


Th e previous chapters have described the design and performance of digital communi- 
cation systems for transmission on either the classical AWGN channel or a linear filter 
channel with AWGN. We observed that the distortion inherent in linear filter channels 
requires special signal design techniques and rather sophisticated adaptive equalization 
algorithms in order to achieve good performance. 

In this chapter, we consider the signal design, receiver structure, and receiver per- 
formance for more complex channels, namely, channels having randomly time variant 
impulse responses. This characterization serves as a model for signal transmission 
over many radio channels such as shortwave ionospheric radio communication in the 
3-30 MHz frequency band (HF), tropsopheric scatter (bey ond-the -horizon) radio com- 
munications in the 300-3000 MHz frequency band (UHF), and 3000-30,000 MHz 
frequency band (SHF), and ionospheric forward scatter in the 30-300 MHz frequency 
band (VHF). The time-variant impulse responses of these channels are a consequence 
of the constantly changing physical characteristics of the media. For example, the ions 
in the ionospheric layers that reflect the signals transmitted in the HF band are always 
in motion. To the user of the channel, the motion of the ions appears to be random. 
Consequently, if the same signal is transmitted at HF in two widely separated time 
intervals, the two received signals will be different. The time-varying responses that 
occur are treated in statistical terms. 

We shall begin our treatment of digital signaling over fading multipath chan- 
nels by first developing a statistical characterization of the channel. Then we shall 
evaluate the performance of several basic digital signaling techniques for commu- 
nication over such channels. The performance results will demonstrate the severe 
penalty in SNR that must be paid as a consequence of the fading characteristics of 
the received signal. We shall then show that the penalty in SNR can be dramati- 
cally reduced by means of efficient modulation/coding and demodulation/decoding 
techniques. 
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■ 13.1 

CHARACTERIZATION OF FADING MUFTIPATH CHANNEFS 

If we transmit an extremely short pulse, ideally an impulse, over a time- varying mul- 
tipath channel, the received signal might appear as a train of pulses, as shown in 
Figure 13.1-1. Hence, one characteristic of a multipath medium is the time spread 
introduced in the signal that is transmitted through the channel. 

A second characteristic is due to the time variations in the structure of the medium. 
As a result of such time variations, the nature of the multipath varies with time. That is, 
if we repeat the pulse-sounding experiment over and over, we shall observe changes in 
the received pulse train, which will include changes in the sizes of the individual pulses, 
changes in the relative delays among the pulses, and, quite often, changes in the number 
of pulses observed in the received pulse train as shown in Figure 13.1-1. Moreover, the 
time variations appear to be unpredictable to the user of the channel. Therefore, it is 
reasonable to characterize the time-variant multipath channel statistically. Toward this 
end, let us examine the effects of the channel on a transmitted signal that is represented 
in general as 


s(t) = Re [s,(t)e M ] 


(13.1-1) 


Transmitted signal 


(a) 


Received signal 


t-t\ | t - fj + r 12 
t = ti + r n 


FIGURE 13.1-1 

Example of the response of a 
time- variant multipath channel to a 
very narrow pulse. 


n 

t-t 2 1 1 - 1 2 + r 22 1 
(b) t=t 2 + v 2 \ t — t 2 + r 2 2 


t=t Q +P 

(c) 


n 


t — ^3 
t- 


n □ n 

| i-i3 + t 32 | i-i 3 + r 34 
i 3 + r 3 i i = i 3 + r 33 


t = l 0 + y 


(d) 


t = t 4 t = l 4 + r 41 
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We assume that there are multiple propagation paths. Associated with each path is 
a propagation delay and an attenuation factor. Both the propagation delays and the 
attenuation factors are time-variant as a result of changes in the structure of the medium. 
Thus, the received bandpass signal may be expressed in the form 

x(0 = ~ r„(0] (13.1-2) 

n 

where o/ n (t) is the attenuation factor for the signal received on the nth path and r n (t) is 
the propagation delay for the nth path. Substitution for s(t) from Equation 14.1-1 into 
Equation 13.1-2 yields the result 

- r„(0]| c (13.1-3) 

It is apparent from Equation 13.1-3 that in the absence of noise the equivalent 
lowpass received signal is 

r i(t) = e ~ j2nfcTn(t) Si[ t - x •„(?)] (13.1-4) 

n 

Since r/(t) is the response of an equivalent lowpass channel to the equivalent low- 
pass signal it follows that the equivalent lowpass channel is described by the 
time- variant impulse response 

c(r; t) = J2 a n (t)e~ j2nf ‘ T " (,) S[ x - x n (t)] (13.1-5) 


x{t) = Re 


For some channels, such as the tropospheric scatter channel, it is more appropriate 
to view the received signal as consisting of a continuum of multipath components. In 
such a case, the received signal x(t) is expressed in the integral form 

/ OO 

a(x\t)s{t — x)dx (13.1-6) 

-OO 


where a(x: t) denotes the attenuation of the signal components at delay r and at time 
instant t. Now substitution for s(t) from Equation 13.1-1 into Equation 13.1-6 yields 


x(t) = Re 


a.{x\ t)e 


-jlnf c x 


si(t — x)dx 


, jlxfct 


(13.1-7) 


Since the integral in Equation 13.1-7 represents the convolution of si(t ) with an equiv- 
alent lowpass time- variant impulse response c(r; t), it follows that 

c(t; t) = a(r; t)e~ j2n ^ cX (13.1-8) 


where c(x; t) represents the response of the channel at time t due to an impulse applied at 
time t — x. Thus Equation 13.1-8 is the appropriate definition of the equivalent lowpass 
impulse response when the channel results in continuous multipath and Equation 13.1-5 
is appropriate for a channel that contains discrete multipath components. 

Now let us consider the transmission of an unmodulated carrier at frequency f c . 
Then ,s/(t) = 1 for all t, and, hence, the received signal for the case of discrete multipath, 
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given by Equation 13. 1—4, reduces to 

ri(t) = ^2a„(t)e~ j27lfcTnit) 


= J2a n (t)e jen{t) 

n 


(13.1-9) 


where 9„{t) = —2jtf c T n (t). Thus, the received signal consists of the sum of a number 
of time-variant vectors (phasors) having amplitudes a n (t) and phases 6 n (t). Note that 
large dynamic changes in the medium are required for a„(t) to change sufficiently to 
cause a significant change in the received signal. On the other hand, 9 n (t) will change 
by 2 Jt rad whenever r„ changes by 1 /f c . But 1 /f c is a small number and, hence, 9„ 
can change by 2n rad with relatively small motions of the medium. We also expect 
the delays r „(t) associated with the different signal paths to change at different rates 
and in an unpredictable (random) manner. This implies that the received signal rft) in 
Equation 13. 1-9 can be modeled as a random process. When there are a large number 
of paths, the central limit theorem can be applied. That is, rft) may be modeled as a 
complex-valued Gaussian random process. This means that the time -variant impulse 
response c(r; t) is a complex-valued Gaussian random process in the t variable. 

The multipath propagation model for the channel embodied in the received signal 
rft), given in Equation 13.1-9, results in signal fading. The fading phenomenon is 
primarily a result of the time variations in the phases {9 n (t)}. That is, the randomly time 
variant phases {0 n (t)} associated with the vectors {a„e-’ 9n } at times result in the vectors 
adding destructively. When that occurs, the resultant received signal rft) is very small 
or practically zero. At other times, the vectors \a„e jf1 " } add constructively, so that the 
received signal is large. Thus, the amplitude variations in the received signal, termed 
signal fading, are due to the time-variant multipath characteristics of the channel. 

When the impulse response c(r; t) is modeled as a zero-mean complex- valued 
Gaussian process, the envelope |c(r; t)\ at any instant t is Rayleigh-distributed. In this 
case the channel is said to be a Rayleigh fading channel. In the event that there are fixed 
scatterers or signal reflectors in the medium, in addition to randomly moving scatterers, 
c(r; /) can no longer be modeled as having zero-mean. In this case, the envelope |c(r; t)\ 
has a Rice distribution and the channel is said to be a Ricean fading channel. Another 
probability distribution function that has been used to model the envelope of fading 
signals is the Nakagami-m distribution. These fading channel models are considered 
in Section 13.1-2. 


13.1-1 Channel Correlation Functions and Power Spectra 

We shall now develop a number of useful correlation functions and power spectral 
density functions that define the characteristics of a fading multipath channel. Our 
starting point is the equivalent lowpass impulse response c(r ; t), which is characterized 
as a complex-valued random process in the t variable. We assume that c(r; t) is wide- 
sense-stationary. Then we define the autocorrelation function of c(r; t) as 

R c (t 2 , ti ; A t) = E [c*(n; t)c(z2l t + At)] 


(13.1-10) 
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FIGURE 13.1-2 

Multipath intensity profile. 
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R c (j) 



In most radio transmission media, the attentuation and phase shift of the channel 
associated with path delay Ti is uncorrelated with the attenuation and phase shift asso- 
ciated with path delay r 2 . This is usually called uncorrelated scattering. We make the 
assumption that the scattering at two different delays is uncorrelated and incorporate it 
into Equation 13.1-10 to obtain 


E [c*( n; t)c( r 2 ; t + At)] = R c ( n; Af)5(r 2 - n) (13.1-11) 

If we let At = 0, the resulting autocorrelation function R c ( r; 0) = R c (t) is simply 
the average power output of the channel as a function of the time delay r. For this 
reason, R c ( r) is called the multipath intensity profile or the delay power spectrum of 
the channel. In general, R c ( r; At) gives the average power output as a function of the 
time delay r and the difference At in observation time. 

In practice, the function R c ( r ; At) is measured by transmitting very narrow pulses 
or, equivalently, a wideband signal and cross-correlating the received signal with a 
delayed version of itself. Typically, the measured function R c ( r) may appear as shown 
in Figure 13.1-2. The range of values of r over which R c ( r) is essentially nonzero is 
called the multipath spread of the channel and is denoted by T m . 

A completely analogous characterization of the time-variant multipath channel 
begins in the frequency domain. By taking the Fourier transform of c(r; t), we obtain 
the time-variant transfer function C(/; t), where / is the frequency variable. Thus, 

/ OO 

c(r; t)e~ j2nfT dr (13.1-12) 

-OO 

If c( r ; t) is modeled as a complex- valued zero-mean Gaussian random process in the t 
variable, it follows that C(/; t) also has the same statistics. Under the assumption that 
the channel is wide-sense-stationary, we define the autocorrelation function 


Rcifi, /t; A r) = E [C*(/ i; t)C(f 2 ; t + At)] (13.1-13) 


Since C(/; t) is the Fourier transform of c(t; t), it is not suiprising to find that 
R c (fi, fu At) is related to R c ( r; At) by the Fourier transform. The relationship is 
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easily established by substituting Equation 13.1-12 into Equation 13.1-13. Thus, 

/ OO poo 

/ E [c*( n; t)c( r 2 ; t + At)] e j2n(flTl ~ flT2) dx l dx 2 

-oo J — OO 


/ — oo ^ — oo 

/'OO /'OO 


/ OO /'OO 

/ At) 5 (t 2 — Xi)e j 2 lT( - firi ~^ 2 T 2 ) dx l dx 2 

-oo J —oo 

/ oo 

(ti ; At)e^ 2n ^'~^ 2)r 'dx\ 

-OO 

poo 

= R c (ru At)e~ j2TcAfri dxi = R c (Af- At) (13.1-14) 


where A/ = f 2 — f \ . From Equation 13.1-14, we observe that R c (Af; At) is the 
Fourier transform of the multipath intensity profile. Furthermore, the assumption of 
uncorrelated scattering implies that the autocorrelation function of C(/; t) in frequency 
is a function of only the frequency difference Af = f 2 — f \. Therefore, it is appropri- 
ate to call R c (Af: At) the spciced-frequency, spaced time correlation function of the 
channel. It can be measured in practice by transmitting a pair of sinusoids separated by 
Af and cross-correlating the two separately received signals with a relative delay At. 

Suppose we set At = 0 in Equation 13.1-14. Then, with Rc(Af ; 0) = Rc(Af) 
and R c ( r; 0) = R i: (t). the transform relationship is simply 

/ OO 

R c (x) e ~ j27rAfr dx (13.1-15) 

-OO 


The relationship is depicted graphically in Figure 13.1-3. Since Rc(Af) is an auto- 
correlation function in the frequency variable, it provides us with a measure of the 
frequency coherence of the channel. As a result of the Fourier transform relationship 
between R c (Af) and R,(x), the reciprocal of the multipath spread is a measure of the 
coherence bandwidth of the channel. That is, 

(A /) C «-S- (13.1-16) 



Spaced-frequency 
correlation function 


R c (r) 



Multipath intensity profile 


FIGURE 13.1-3 

Relationship between Rc(Af) and R c ( r). 
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where ( A/) c denotes the coherence bandwidth. Thus, two sinusoids with frequency sep- 
aration greater than (A f) c are affected differently by the channel. When an information- 
bearing signal is transmitted through the channel, if (A f) c is small in comparison to 
the bandwidth of the transmitted signal, the channel is said to be frequency-selective. 
In this case, the signal is severely distorted by the channel. On the other hand, if (A f) c 
is large in comparison with the bandwidth of the transmitted signal, the channel is said 
to be frequency-nonselective . 

We now focus our attention on the time variations of the channel as measured by 
the parameter At in R c (Af\ At). The time variations in the channel are evidenced as 
a Doppler broadening and, perhaps, in addition as a Doppler shift of a spectral line. 
In order to relate the Doppler effects to the time variations of the channel, we define 
the Fourier transform of Rc(A /; At) with respect to the variable At to be the function 
S c (Af\ A). That is. 


/ OO 

R c (Af- At)e~ jl7TXA, dAt (13.1-17) 

-OO 

With A / set to zero and 5c(0; A) = 6); (A), the relation in Equation 14.1-17 becomes 

/ OO 

R c (0\ At)e~ j2nXAt dAt (13.1-18) 

-OO 

The function ScO) is a power spectrum that gives the signal intensity as a function 
of the Doppler frequency A. Hence, we call 5 C ( A) the Doppler power spectrum of the 
channel. 

FromEquation 13.1-18, we observe that if the channel is time-invariant, Rc(At) = 
1 and ScO'.) becomes equal to the delta function 5(A). Therefore, when there are no time 
variations in the channel, there is no spectral broadening observed in the transmission 
of a pure frequency tone. 

The range of values of A over which Sc (A) is essentially nonzero is called the 
Doppler spread B,[ of the channel. Since ScO 1 -) is related to R c (At) by the Fourier 
transform, the reciprocal of B,/ is a measure of the coherence time of the channel. That 
is. 


(A t) c * 4- (13.1-19) 

Del 

where (A t) c denotes the coherence time. Clearly, a slowly changing channel has a large 
coherence time or, equivalently, a small Doppler spread. Figure 13.1-4 illustrates the 
relationship between Rc(At) and ScO-). 

We have now established a Fourier transform relationship between R c (Af\ At) 
and R c (r; At) involving the variables (r, A/), and a Fourier transform relationship 
between R c (Af\ At) and Sc(Af: A) involving the variables (At, A). There are two 
additional Fourier transform relationships that we can define, which serve to relate 
R c (x ; At) to S( ( A/; A) and, thus, close the loop. The desired relationship is obtained 
by defining a new function, denoted by 5(r; A), to be the Fourier transform of R c ( r; At) 
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l*c(AOI 



Spaced-time correlation fnction 


Fourier 

transform 

pair 


<S C (A) 



B d 


Doppler power spectrum 


FIGURE 13.1-4 

Relationship between Rc(At) and <Sc(A). 


in the At variable. That is, 

/ OO 

R c ( r; At)e~ j2nXA, dAt (13.1-20) 

-OO 

It follows that 5(r; A) and 5 C (A/; A) are a Fourier transform pair. That is, 

/ OO 

S c (.Af\ X)e j2nxAf dAf (13.1-21) 

-OO 

Furthermore, S(t; A) and R c (Af\ At) are related by the double Fourier transform 

/ OO r OO 

/ R c (Af- At)e~ j27lXA, e j27ITAf dAtdAf (13.1-22) 

-oo J — OO 

This new function S( r; A) is called the scattering function of the channel. It provides 
us with a measure of the average power output of the channel as a function of the time 
delay r and the Doppler frequency A. 

The relationships among the four functions Rc(Af; At), R c ( r; At), Sc(Af; A), 
and S( r; A) are summarized in Figure 13.1-5. 


EXAMPLE 13.1-1. SCATTERING FUNCTION OF A TROPOSPHERIC SCATTER CHANNEL. 
The scattering function S( r; A) measured on a 150-mi tropospheric scatter link is 
shown in Figure 13.1-6. The signal used to probe the channel had a time resolution 
of 0.1 /is. Hence, the time-delay axis is quantized in increments of 0.1 /is. From the 
graph, we observe that the multipath spread T m = 0.7 /is. On the other hand, the 
Doppler spread, which may be defined as the 3-dB bandwidth of the power spectrum 
for each signal path, appears to vary with each signal path. For example, in one path it is 
less than 1 Hz, while in some other paths it is several hertz. For our purposes, we shall 
take the largest of these 3-dB bandwidths of the various paths and call that the Doppler 
spread. 


EXAMPLE 13.1-2. MULTIPATH INTENSITY PROFILE OF MOBILE RADIO CHANNELS. The 
multipath intensity profile of a mobile radio channel depends critically on the type of 
terrain. Numerous measurements have been made under various conditions in many 
parts of the world. In urban and suburban areas, typical values of multipath spreads 
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FIGURE 13.1-5 

Relationships among the channel correlation functions and power spectra. [ From Green 
(1962), with permission.) 


range from 1 to 10 /as. In rural mountainous areas, the multipath spreads are much 
greater, with typical values in the range of 10 to 30 /is. Two models for the multipath 
intensity profile that are widely used in evaluating system performance for these two 
types of terrain are illustrated in Figure 13.1-7. 

EXAMPLE 13.1-3. DOPPLER POWER SPECTRUM OF MOBILE RADIO CHANNELS. A 
widely used model for the Doppler power spectrum of a mobile radio channel is the so- 
called Jakes’ model (Jakes, 1974). In this model, the autocorrelation of the time-variant 
transfer function C(/; t) is given as 


R c (At) = E[C*(f- t)C(fU + At)] 
— Jo(2xfm AO 
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FIGURE 13.1-6 

Scattering function of a medium-range tropospheric scatter channel. The taps delay increment 
is 0.1 n s. 


where Jq(-) is the zero-order Bessel function of the first kind and /,„ = vfo/c is the 
maximum Doppler frequency, where v is the vehicle speed in meters per second (m/s), 
/o is the carrier frequency, and c is the speed of light (3 x 10 8 m/s). The Fourier 
transform of this autocorrelation function yields the Doppler power spectrum. That is 


ScW = 



Rc{ht)e~ j2nXM d At 
Jo(2nf„ At)e~ j2nkAt d At 


1 


Vi - (///„, ) 2 

0 


I/I < fm 
I/I > U 


The graph of Sq (k) is shown in Figure 13.1-8. 


13.1-2 Statistical Models for Fading Channels 

There are several probability distributions that can be considered in attempting to model 
the statistical characteristics of the fading channel. When there are a large number of 
scatterers in the channel that contribute to the signal at the receiver, as is the case in 
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(b) 

FIGURE 13.1-7 

Cost 207 average power delay profiles: (a) typical delay profile for suburban and urban areas; 
(b) typical “bad”-case delay profile for hilly terrain. [From Cost 207 Document 207 TD (86)51 
rev 5.] 


<S C « 



FIGURE 13.1-8 

Model of Doppler spectrum for a mobile 
radio channel. 


ionospheric or tropospheric signal propagation, application of the central limit theorem 
leads to a Gaussian process model for the channel impulse response. If the process is 
zero-mean, then the envelope of the channel response at any time instant has a Rayleigh 
probability distribution and the phase is uniformly distributed in the interval (0, 2n). 
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That is 



(13.1-23) 


where 


n = e(r 2 ) 


(13.1-24) 


We observe that the Rayleigh distribution is characterized by the single parameter 
E(R 2 ). 


An alternative statistical model for the envelope of the channel response is the 
Nakagami-m distribution given by the PDF in Equation 2.3-67. In contrast to the 
Rayleigh distribution, which has a single parameter that can be used to match the fad- 
ing channel statistics, the Nakagami -772 is a two-parameter distribution, involving the 
parameter m and the second moment O = E(R 2 ). As a consequence, this distribution 
provides more flexibility and accuracy in matching the observed signal statistics. The 
Nakagami-m distribution can be used to model fading channel conditions that are either 
more or less severe than the Rayleigh distribution, and it includes the Rayleigh distribu- 
tion as a special case (772 =1). For example, Turin et al. ( 1972) and Suzuki (1977) have 
shown that the Nakagami-772 distribution provides the best fit for data signals received 
in urban radio multipath channels. 

The Rice distribution is also a two-parameter distribution. It may be expressed by 
the PDF given in Equation 2.3-56, where the parameters are .v and a 2 , where s 2 is called 
the noncentrality parameter in the equivalent chi-square distribution. It represents the 
power in the nonfading signal components, sometimes called specular components, of 
the received signal. 

There are many radio channels in which fading is encountered that are basically line- 
of-sight (EOS) communication links with multipath components arising from secondary 
reflections, or signal paths, from surrounding terrain. In such channels, the number of 
multipath components is small, and, hence, the channel may be modeled in a somewhat 
simpler form. We cite two channel models as examples. 

As the first example, let us consider an airplane to ground communication link in 
which there is the direct path and a single multipath component at a delay to relative to 
the direct path. The impulse response of such a channel may be modeled as 


where a is the attenuation factor of the direct path and (i(t) represents the time-variant 
multipath signal component resulting from terrain reflections. Often, /1(f) can be charac- 
terized as a zero-mean Gaussian random process. The transfer function for this channel 
model may be expressed as 


This channel fits the Ricean fading model defined previously. The direct path with 
attenuation a represents the specular component and ( J >(t ) represents the Rayleigh fading 
component. 

A similar model has been found to hold for microwave LOS radio channels used 
for long-distance voice and video transmission by telephone companies throughout the 


c(r; t) = a8{x) + /3(t)8[T - r 0 (t)] 


(13.1-25) 


C(f; t) = a + me 


-jlnfzoit) 


(13.1-26) 
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world. For such channels, Rummler (1979) has developed a three-path model based on 
channel measurements performed on typical LOS links in the 6-GHz frequency band. 
The differential delay on the two multipath components is relatively small, and, hence, 
the model developed by Rummler is one that has a channel transfer function 

C{f) = «[1 - pe-JW-foto] (13.1-27) 

where a is the overall attenuation parameter, /? is called a shape parameter which is due 
to the multipath components, /o is the frequency of the fade minimum, and to is the 
relative time delay between the direct and the multipath components. This simplified 
model was used to fit data derived from channel measurements. 

Rummler found that the parameters ol and f J > may be characterized as random 
variables that, for practical purposes, are nearly statistically independent. From the 
channel measurements, he found that the distribution of j3 has the form (1 — /f) 2 ' 3 . 
The distribution of a is well modeled by the lognormal distribution, i.e., — log a is 
Gaussian. For /l > 0.5, the mean of —20 log a was found to be 25 dB and the standard 
deviation was 5 dB. For smaller values of the mean decreases to 15 dB. The delay 
parameter determined from the measurements was xq = 6.3 ns. The magnitude-square 
response of C(/) is 

|C(/)| 2 = a 2 [l + p 2 - 2 / 6cos27r(/ - /„) r 0 ] (13.1-28) 

| C (/) | is plotted in Figure 1 3 . 1 -9 as a function of the frequency / — /o for ro = 6.3ns. 
Note that the effect of the multipath component is to create a deep attenuation at f = f 0 
and at multiples of 1 /tq ~ 159 MHz. By comparison, the typical channel bandwidth 
is 30 MHz. This model was used by Lundgren and Rummler (1979) to determine the 
error rate performance of digital radio systems. 

Propagation models for mobile radio channels In the link budget calculations 
that were described in Section 4.10-2, we had characterized the path loss of radio 
waves propagating through free space as being inversely proportional to d 2 , where d 
is the distance between the transmitter and the receiver. However, in a mobile radio 



FIGURE 13.1-9 

Magnitude frequency response of LOS channel model. 
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channel, propagation is generally neither free space nor line of sight. The mean path 
loss encountered in mobile radio channels may be characterized as being inversely 
proportional to d p , where 2 <P< 4, with d 4 being a worst-case model. Consequently, 
the path loss is usually much more severe compared to that of free space. 

There are a number of factors affecting the path loss in mobile radio communi- 
cations. Among these factors are base station antenna height, mobile antenna height, 
operating frequency, atmospheric conditions, and presence or absence of buildings and 
trees. Various mean path loss models have been developed that incorporate such factors. 
For example, a model for a large city in an urban area is the Hata model, in which the 
mean path loss is expressed as 


where / is the operating frequency in MFIz (150 < / < 1500), h, is the transmitter 
antenna height in meters (30 < h, < 200), h r is the receiver antenna height in meters 
(1 < h r < 1 0), d is the distance between transmitter and receiver in km ( I < d < 20), 
and 


Another problem with mobile radio propagation is the effect of shadowing of the 
signal due to large obstructions, such as large buildings, trees, and hilly terrain between 
the transmitter and the receiver. Shadowing is usually modeled as a multiplicative and, 
generally, slowly time varying random process. That is, the received signal may be 
characterized mathematically as 


where Ao represents the mean path loss, s(t) is the transmitted signal, and g(t) is a 
random process that represents the shadowing effect. At any time instant, the shadowing 
process is modeled statistically as lognormally distributed. The probability density 
function for the lognormal distribution is 


Loss in dB = 69.55 + 26. 16 log 10 / — 13.82 log 10 h, — a(h r ) 
+ (44.9 - 6.55 log 10 h t ) log 10 d 


(13.1-29) 


a(h r ) = 3.2(log 10 1 1.75/z r ) 2 — 4.97, /> 400 MHz (13.1-30) 


r(t) = A 0 g(t)s(t) 


(13.1-31) 



(S>0) 

(g <0) 


(13.1-32) 


If we define a new random variable X as X = In g, then 



— OO < X < oo 


(13.1-33) 


The random variable X represents the path loss measured in dB, fi is the mean path 
loss in dB, and a is the standard deviation of the path loss in dB. For typical cellular 
and microcellular environments, a is in the range of 5-12 dB. 
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■ 13.2 

THE EFFECT OF SIGNAL CHARACTERISTICS ON THE CHOICE 
OF A CHANNEL MODEL 


Having discussed the statistical characterization of time-variant multipath channels 
generally in terms of the correlation functions describe in Section 13. 1, we now consider 
the effect of signal characteristics on the selection of a channel model that is appropriate 
for the specified signal. Thus, let si(t) be the equivalent lowpass signal transmitted over 
the channel and let Si(f) denote its frequency content. Then the equivalent lowpass 
received signal, exclusive of additive noise, may be expressed either in terms of the 
time-domain variables c(r; t) and sft) as 

/ OO 

c(r; t)si(t — z)dx (13.2-1) 

-OO 

or in terms of the frequency functions C(/; t) and Sff) as 

/ OO 

C(f-,t)S,(f)e j2 * fr df (13.2-2) 

-OO 

Suppose we are transmitting digital information over the channel by modulating 
(either in amplitude, or in phase, or both) the basic pulse sft) at a rate I / 7’ , where 
T is the signaling interval. It is apparent from Equation 13.2-2 that the time- variant 
channel characterized by the transfer function C(/; t) distorts the signal Sff). If 
S/(/) has a bandwidth W greater than the coherence bandwidth (A f) c of the channel, 
Si(f ) is subjected to different gains and phase shifts across the band. In such a case, 
the channel is said to be frequency-selective. Additional distortion is caused by the 
time variations in C(f; t). This type of distortion is evidenced as a variation in the 
received signal strength, and has been termed fading. It should be emphasized that the 
frequency selectivity and fading are viewed as two different types of distortion. The 
former depends on the multipath spread or, equivalently, on the coherence bandwidth 
of the channel relative to the transmitted signal bandwidth W. The latter depends on 
the time variations of the channel, which are grossly characterized by the coherence 
time (A/) c or, equivalently, by the Doppler spread B,i. 

The effect of the channel on the transmitted signal si(t) is a function of our choice of 
signal bandwidth and signal duration. For example, if we select the signaling interval 
T to satisfy the condition T T m , the channel introduces a negligible amount of 
intersymbol interference. If the bandwidth of the signal pulse sft) is W ~ 1 / 7’ , the 
condition T T m implies that 

W « « (A f) c (13.2-3) 

1 m 

That is, the signal bandwidth W is much smaller than the coherence bandwidth of the 
channel. Hence, the channel is frequency-nonselective. In other words, all the frequency 
components in Sff) undergo the same attenuation and phase shift in transmission 
through the channel. But this implies that, within the bandwidth occupied by Sff), 
the time-variant transfer function C(/; t) of the channel is a complex-valued constant 
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in the frequency variable. Since S/(/) has its frequency content concentrated in the 
vicinity of / = 0, C(f; t) = C(0; t). Consequently, Equation 13.2-2 reduces to 

/ OO 

S,(f)e j2nft df 

(13.2-4) 


= C( 0; 


Thus, when the signal bandwidth W is much smaller than the coherence bandwidth 
(A f) c of the channel, the received signal is simply the transmitted signal multiplied by 
a complex- valued random process C(0; t), which represents the time-variant character- 
istics of the channel. In this case, we say that the multipath components in the received 
are not resolvable because W <3C (A f) c . 

The transfer function C(0; t) for a frequency-nonselective channel may be ex- 
pressed in the form 

C(0; t) = (13.2-5) 


where a(t ) represents the envelope and fit) represents the phase of the equivalent 
lowpass channel. When C(0; t) is modeled as a zero-mean complex- valued Gaussian 
random process, the envelope a(t) is Rayleigh-distributed for any fixed value of t and 
fit) is uniformly distributed over the interval (— jr, n). The rapidity of the fading on 
the frequency-nonselective channel is determined either from the correlation function 
Rc(At) or from the Doppler power spectrum 6<- ■(/.). Alternatively, either of the channel 
parameters (A t) c or B,j can be used to characterize the rapidity of the fading. 

For example, suppose it is possible to select the signal bandwidth W to satisfy the 
condition W <JC (A f) c and the signaling interval T to satisfy the condition T <<C (A t) c . 
Since T is smaller than the coherence time of the channel, the channel attenuation and 
phase shift are essentially fixed for the duration of at least one signaling interval. When 
this condition holds, we call the channel a slowly fading channel. Furthermore, when 
W ~ 1 / T, the conditions that the channel be frequency-nonselective and slowly fading 
imply that the product of T m and B,i must satisfy the condition T m Bj < 1 . 

The product T m B,i is called the spread factor of the channel. If T m B,j < 1, the 
channel is said to be underspread', otherwise, it is overspread. The multipath spread, 
the Doppler spread, and the spread factor are listed in Table 13.2-1 for several channels. 


TABLE 13.2-1 

Multipath Spread, Doppler Spread, and Spread Factor for Several Time- Variant 
Multipath Channels 


Multipath duration, Doppler spread. Spread 
Type of channel s Hz factor 


Shortwave ionospheric propagation (HF) 
Ionospheric propagation under distributed 
auroral conditions (HF) 

Ionospheric forward scatter (VHF) 
Tropospheric scatter (SHF) 

Orbital scatter (X band) 

Moon at max. libration (fo = 0.4 kmc) 


10-3-10- 2 

lO-'-l 

0 

1 

-h*. 

1 

o 

O 

1 

| W 

O 

K> 

10-100 

10" 2 -1 

10~ 4 

10 

10- 3 

10~ 6 

10 

10- 5 

10~ 4 

10 3 

10- 1 

10~ 2 

10 

10- 1 
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We observe from this table that several radio channels, including the moon when used 
as a passive reflector, are underspread. Consequently, it is possible to select the signal 
Si(t) such that these channels are frequency-nonselective and slowly fading. The slow- 
fading condition implies that the channel characteristics vary sufficiently slowly that 
they can be measured. 

In Section 13.3, we shall determine the error rate performance for binary signaling 
over a frequency-nonselective slowly fading channel. This channel model is, by far, the 
simplest to analyze. More importantly, it yields insight into the performance character- 
istics for digital signaling on a fading channel and serves to suggest the type of signal 
waveforms that are effective in overcoming the fading caused by the channel. 

Since the multipath components in the received signal are not resolvable when the 
signal bandwidth W is less than the coherence bandwidth (A/) c of the channel, the 
received signal appears to arrive at the receiver via a single fading path. On the other 
hand, we may choose W (A f) c , so that the channel becomes frequency-selective. 

We shall show later that, under this condition, the multipath components in the received 
signal are resolvable with a resolution in time delay of 1/ W. Thus, we shall illustrate 
that the frequency-selective channel can be modeled as a tapped delay line (transversal) 
filter with time- variant tap coefficients. We shall then derive the performance of binary 
signaling over such a frequency-selective channel model. 


■ 13.3 

FREQUENCY-NONSELECTIVE, SLOWLY FADING CHANNEL 

In this section, we derive the error rate performance of binary PSK and binary FSK when 
these signals are transmitted over a frequency-nonselective, slowly fading channel. As 
described in Section 13.2, the frequency-nonselective channel results in multiplicative 
distortion of the transmitted signal s/(f). Furthermore, the condition that the channel 
fades slowly implies that the multiplicative process may be regarded as a constant 
during at least one signaling interval. Consequently, if the transmitted signal is 5/(0, 
the received equivalent lowpass signal in one signaling interval is 

1-/(0 = ae^siit) + z(t ), 0 < t < T (13.3-1) 

where z(t) represents the complex- valued white Gaussian noise process corrupting the 
signal. 

Let us assume that the channel fading is sufficiently slow that the phase shift </> can 
be estimated from the received signal without error. In that case, we can achieve ideal 
coherent detection of the received signal. Thus, the received signal can be processed 
by passing it through a matched filter in the case of binary PSK or through a pair of 
matched filters in the case of binary FSK. One method that we can use to determine the 
performance of the binary communication systems is to evaluate the decision variables 
and from these determine the probability of error. However, we have already done 
this for a fixed (time -invariant) channel. That is, for a fixed attenuation a, we know 
the probability of error for binary PSK and binary FSK. From Equation 4.3-13, the 
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expression for the error rate of binary PSK as a function of the received SNR y* is 

Pb(Yb) = Q (y/2^) (13.3-2) 

where y b = oc 2 £ b /No . The expression for the error rate of binary FSK, detected coher- 
ently, is given by Equation 4.2-32 as 

Pb(Yb) = Q ( y/n ) (13.3-3) 

We view Equations 13.3-2 and 13.3-3 as conditional error probabilities, where the 
condition is that a is fixed. To obtain the error probabilities when a is random, we must 
average P b (yb), given in Equations 13.3-2 and 13.3-3, over the probability density 
function of y b . That is, we must evaluate the integral 

roo 

Pb= P b (Yb)p(Yb)dy b (13.3-4) 

Jo 

where p(y b ) is the probability density function of y/, when a is random. 


Rayleigh fading When a is Rayleigh-distributed, a 2 has a chi-square probabil- 
ity distribution with two degrees of freedom. Consequently, yb also is chi-square- 
distributed. It is easily shown that 

p(Yb) = y b > 0 (13.3-5) 

Yb 

where y b is the average signal-to-noise ratio, defined as 

Yb = ^£(« 2 ) (13.3-6) 

A o 

The term E(a 2 ) is simply the average value of a 2 . 

Now we can substitute Equation 13.3-5 into Equation 13.3-4 and carry out the 
integration for P b {y b ) as given by Equations 13.3-2 and 13.3-3. The result of this 
integration for binary PSK is (see Problems 4.44 and 4.50) 


Pb = 


1 

2 



(13.3-7) 


If we repeat the integration with P b (y b ) given by Equation 13.3-3, we obtain the 
probability of error for binary FSK, detected coherently, in the form 


Pb = 


1 

2 



(13.3-8) 


In arriving at the error rate results in Equations 13.3-7 and 13.3-8, we have assumed 
that the estimate of the channel phase shift, obtained in the presence of slow fading, 
is noiseless. Such an ideal condition may not hold in practice. In such a case, the 
expressions in Equations 13.3-7 and 13.3-8 should be viewed as representing the best 
achievable performance in the presence of Rayleigh fading. In Appendix C we consider 
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the problem of estimating the phase in the presence of noise and we evaluate the error 
rate performance of binary and multiphase PSK. 

On channels for which the fading is sufficiently rapid to preclude the estimation 
of a stable phase reference by averaging the received signal phase over many signaling 
intervals, DPSK, is an alternative signaling method. Since DPSK requires phase stability 
over only two consecutive signaling intervals, this modulation technique is quite robust 
in the presence of signal fading. In deriving the performance of binary DPSK for a 
fading channel, we begin again with the error probability for a nonfading channel, 
which is 


Pb(Yb) = ke~ n (13.3-9) 

This expression is substituted into the integral in Equation 13.3-4 along with p(yb) ob- 
tained from Equation 13.3-5. Evaluation of the resulting integral yields the probability 
of error for binary DPSK, in the form 


1 

P h = 

2(l + y,) 


(13.3-10) 


If we choose not to estimate the channel phase shift at all, but instead employ a 
noncoherent (envelope or square-law) detector with binary, orthogonal FSK signals, 
the error probability for a nonfading channel is 

PbiYb) = \e-v> 12 (13.3-11) 


When we average PbiYb ) over the Rayleigh fading channel attenuation, the resulting 
error probability is 


Pb 


1 

2 + 7b 


(13.3-12) 


The error probabilities in Equations 13.3-7, 13.3-8, 13.3-10, and 13.3-12 are 
illustrated in Figure 13.3-1. In comparing the performance of the four binary signaling 
systems, we focus our attention on the probabilities of error for large SNR, i.e., y h 1. 
Under this condition, the error rates in Equations 13.3-7, 13.3-8, 13.3-10, and 13.3-12 
simplify to 


Pb 


1 /4 Yb f° r coherent PSK 

1 /2 Yb f° r coherent, orthogonal FSK 

1/2 y b for DPSK 

1 !Y b for noncoherent, orthogonal FSK 


(13.3-13) 


From Equation 13.3-13, we observe that coherent PSK is 3 dB better than DPSK 
and 6 dB better than noncoherent FSK. More striking, however, is the observtion that 
the error rates decrease only inversely with SNR. In contrast, the decrease in error 
rate on a nonfading channel is exponential with SNR. This means that, on a fading 
channel, the transmitter must transmit a large amount of power in order to obtain a low 
probability of error. In many cases, a large amount of power is not possible, technically 
and/or economically. An alternative solution to the problem of obtaining acceptable 
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FIGURE 13.3-1 

Performance of binary signaling on a 
Rayleigh fading channel. 


performance on a fading channel is the use of redundancy, which can be obtained by 
means of diversity techniques, as discussed in Section 13.4. 

Nakagami fading If a is characterized statistically by the Nakagami-m distribu- 
tion, the random variable y = a 2 £b/No has the PDF (see Problem 13.14) 

Wl™ 

,,(r) = wr r ''^ mr,T <133 - 14) 


where / = E(a 2 )£ /Nq. 

The average probability of error for any of the modulation methods is simply 
obtained by averaging the appropriate error probability for a nonfading channel over 
the fading signal statistics. 

As an example of the performance obtained with Nakagami-/;! fading statistics, 
Figure 13.3-2 illustrates the probability of error of binary PSK with m as a parameter. 
We recall that m = 1 corresponds to Rayleigh fading. We observe that the performance 
improves as m is increased above m = 1 , which is indicative of the fact that the fading 
is less severe. On the other hand, when m < 1 , the performance is worse than Rayleigh 
fading. 

Other fading signal statistics Following the procedure describe above, one can 
determine the performance of the various modulation methods for other types of fading 
signal statistics, such as Ricean Fading. 
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Average SNR y b (dB) 


FIGURE 13.3-2 

Average error probability for two-phase 
PSK with Nakagami fading. 


Error probability results for Rice-distributed fading statistics can be found in the 
paper by Lindsey (1964), while for Nakagami-m fading statistics, the reader may refer 
to the papers by Esposito (1967), Miyagaki et al. (1978), Charash (1979), Al-Hussaini 
et al. (1985), and Beaulieu and Abu-Dayya (1991). 


■ 13.4 

DIVERSITY TECHNIQUES FOR FADING MULTIPATH CHANNELS 

Diversity techniques are based on the notion that errors occur in reception when the 
channel attenuation is large, i.e., when the channel is in a deep fade. If we can sup- 
ply to the receiver several replicas of the same information signal transmitted over 
independently fading channels, the probability that all the signal components will fade 
simultaneously is reduced considerably. That is, if p is the probability that any one 
signal will fade below some critical value, then p L is the probability that all L inde- 
pendently fading replicas of the same signal will fade below the critical value. There 
are several ways in which we can provide the receiver with L independently fading 
replicas of the same information-bearing signal. 

One method is to employ frequency diversity. That is, the same information-bearing 
signal is transmitted on L carriers, where the separation between successive carriers 
equals or exceeds the coherence bandwidth (A f) c of the channel. 

A second method for achieving L independently fading versions of the same 
information-bearing signal is to transmit the signal in L different time slots, where 
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the separation between successive time slots equals or exceeds the coherence time 
(A t) c of the channel. This method is called time diversity. 

Note that the fading channel fits the model of a bursty error channel. Furthermore, 
we may view the transmission of the same information either at different frequencies or 
in difference time slots (or both) as a simple form of repetition coding. The separation 
of the diversity transmissions in time by (A t) c or in frequency by (A f) c is basically 
a form of block-interleaving the bits in the repetition code in an attempt to break up 
the error bursts and, thus, to obtain independent errors. Later in the chapter, we shall 
demonstrate that, in general, repetition coding is wasteful of bandwidth when compared 
with nontrivial coding. 

Another commonly used method for achieving diversity employs multiple anten- 
nas. For example, we may employ a single transmitting antenna and multiple receiving 
antennas. The latter must be spaced sufficiently far apart that the multipath components 
in the signal have significantly different propagation delays at the antennas. Usually a 
separation of a few wavelengths is required between two antennas in order to obtain 
signals that fade independently. 

A more sophisticated method for obtaining diversity is based on the use of a 
signal having a bandwidth much greater than the coherence bandwidth (A f) c of the 
channel. Such a signal with bandwidth W will resolve the multipath components and, 
thus, provide the receiver with several independently fading signal paths. The time 
resolution is 1 /W. Consequently, with a multipath spread of T m seconds, there are 
T„,W resolvable signal components. Since T m ~ 1/(A f) c , the number of resolvable 
signal components may also be expressed as W /(_ A f) c . Thus, the use of a wideband 
signal may be viewed as just another method for obtaining frequency diversity of order 
L ~ W /(A f) c . The optimum demodulator for processing the wideband signal will be 
derived in Section 13.5. It is called a RAKE correlator or a RAKE matched filter and 
was invented by Price and Green (1958). 

There are other diversity techniques that have received some consideration in prac- 
tice, such as angle-of-arrival diversity and polarization diversity. However, these have 
not been as widely used as those described above. 


13.4-1 Binary Signals 

We shall now determine the error rate performance for a binary digital communication 
system with diversity. We begin by describing the mathematical model for the com- 
munication system with diversity. First of all, we assume that there are L diversity 
channels, carrying the same information-bearing signal. Each channel is assumed to be 
frequency-nonselective and slowly fading with Rayleigh-distributed envelope statistics. 
The fading processes among the L diversity channels are assumed to be mutually statis- 
tically independent. The signal in each channel is corrupted by an additive zero-mean 
white Gaussian noise process. The noise processes in the L channels are assumed to be 
mutually statistically independent, with identical autocorrelation functions. Thus, the 
equivalent low-pass received signals for the L channels can be expressed in the form 

rik(t) = a k e ]4,k s km (t ) + z k (t), 


k = 1,2, ...,L, 


m = 1,2 (13.4-1) 
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where {o^e^*} represent the attenuation factors and phase shifts for the L channels, 
Skm(t ) denotes the m th signal transmitted on the kth channel, and 7, kit) denotes the 
additive white Gaussian noise on the /cth channel. All signals in the set {.v/. m (f )} have 
the same energy. 

The optimum demodulator for the signal received from the kth channel consists of 
two matched filters, one having the impulse response 

bk\it) = s* k] (T - t) (13.4-2) 

and the other having the impulse response 

b k2 (t) = S * k2 (T - t) (13.4-3) 

Of course, if binary PSK is the modulation method used to transmit the information, then 
Ski(t) = —Skiit). Consequently, only a single matched filter is required for binary PSK. 
Following the matched filters is a combiner that forms the two decision variables. The 
combiner that achieves the best performance is one in which each matched filter output 
is multiplied by the corresponding complex- valued (conjugate) channel gain 
The effect of this multiplication is to compensate for the phase shift in the channel 
and to weight the signal by a factor that is proportional to the signal strength. Thus, 
a strong signal carries a larger weight than a weak signal. After the complex- valued 
weighting operation is performed, two sums are formed. One consists of the real parts 
of the weighted outputs from the matched tilters corresponding to a transmitted 0. The 
second consists of the real part of the outputs from the matched biters corresponding 
to a transmitted 1. This optimum combiner is called a maximal ratio combiner by 
Brennan (1959). Of course, the realization of this optimum combiner is based on the 
assumption that the channel attenuations j a.k } and the phase shifts { cf ) k } are known 
perfectly. That is, the estimates of the parameters {a/.} and { (f > k } contain no noise. (The 
effect of noisy estimates on the error rate performance of multiphase PSK is considered 
in Appendix C.) 

A block diagram illustrating the model for the binary digital communication system 
described above is shown in Figure 13.4-1. 

Let us brst consider the performance of binary PSK with Lth-order diversity. The 
output of the maximal ratio combiner can be expressed as a single decision variable in 
the form 

( L L 

2 £j2 a l + J2 akNk 

k= 1 k= 1 

L L 

= 2 £ ^2 ai + ^2 akNkr 

k = 1 k = 1 


(13.4-4) 


where N& denotes the real part of the complex- valued Gaussian noise variable 

N k = [ T Zk (t)s* k (t ) dt (13.4-5) 

Jo 

We follow the approach used in Section 13.3 in deriving the probability of error. That is, 
the probability of error conditioned on a bxed set of attenuation factors {a*.} is obtained 
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FIGURE 13.4-1 

Model of binary digital communication system with diversity. 


first. Then the conditional probability of error is averaged over the probability density 
function of the {c^}. 


Rayleigh fading For a fixed set of {«/. } the decision variable U is Gaussian with 
mean 

L 


E(U) = 2£J2 


Oil, 


k=\ 


and variance 


(13.4-6) 


al =2 £NqY j 




k=\ 


(13.4-7) 


For these values of the mean and variance, the probability that U is less than zero is 
simply 

PbiYb) = Q (VWb) (13.4-8) 

where the SNR per bit, yt,, is given as 

n = lpf 

L k ~ 1 (13.4-9) 

= 5 > 

k= 1 


where yk = Sal/ No is the instantaneous SNR on the A t It channel. Now we must de- 
termine the probability density function p{yb)- This function is most easily determined 
via the characteristic function of y/,. First of all, we note that for L = 1, y b = y x has 
a chi-square probability density function given in Equation 13.3-5. The characteristic 
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function of y\ is easily shown to be 

<& M (u) = E(e jvyi ) 

= 1 (13.4-10) 

1 - jvy c 

where y c is the average SNR per channel, which is assumed to be identical for all 
channels. That is, 

y c = ^E(a 2 k ) (13.4-11) 

independent of k. This assumption applies for the results throughout this section. Since 
the fading on the L channels is mutually statistically independent, the { y k } are statisti- 
cally independent, and, hence, the characteristic function for the sum y b is simply the 
result in Equation 13.4-10 raised to the Lth power, i.e., 

= l — r (13.4-12) 

(l - Jvy c ) L 

But this is the characteristic function of a chi-square -distributed random variable with 
2 L degrees of freedom. It follows from Equation 2.3-21 that the probability density 
function p(y b ) is 

p(y b ) = 1 rYh~ l e~ Yb,Vc (13.4-13) 

(L-\)\y L c n v 

The final step in this derivation is to average the conditional error probability given 
in Equation 13.4-8 over the fading channel statistics. Thus, we evaluate the integral 

roo 

P h = / Pi(Yb)p{Yb)dy b (13.4-14) 

Jo 

There is a closed-form solution for Equation 13.4-14, which can be expressed as 


Pb 




k = 0 




(13.4-15) 


where, by definition 


p. = 



(13.4-16) 


When the average SNR per channel, y c , satisfies the condition y c 1, the term 
\(l + /i) ~ I and the term j(l — /r) ~ 1/4 y c . Furthermore, 
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Therefore, when y c is sufficiently large (greater than 10 dB), the probability of error 
in Equation 13.4-15 can be approximated as 


Pb 



(13.4-18) 


We observe from Equation 13.4-18 that the probability of error varies as 1 /y c raised to 
the Lth power. Thus, with diversity, the error rate decreases inversely with the Lth power 
of the SNR. 

Having obtained the performance of binary PSK with diversity, we now turn our 
attention to binary, orthogonal FSK that is detected coherently. In this case, the two 
decision variables at the output of the maximal ratio combiner may be expressed as 


Ui = Re 


U 2 = Re 


2£ ^2 a k + 5Z a k N k\ 

\ k= 1 
( L ^ 

<XkNk2 


k= 1 


\k= 1 


(13.4-19) 


where we have assumed that signal Ski(t) was transmitted and where {Nki} and {Nki} 
are the two sets of noise components at the output of the matched biters. The probability 
of error is simply the probability that U 2 > U\ . This computation is similar to the one 
performed for PSK, except that we now have twice the noise power. Consequently, 
when the {o^} are bxed, the conditional probability of error is 

Pb(Yb) = Q (Vn) (13.4-20) 

We use Equation 13.4-13 to average Pb(Yb ) over the fading. It is not surprising to bnd 
that the result given in Equation 13.4-15 still applies, with y c replaced by Py c . That is. 
Equation 13.4-15 is the probability of error for binary, orthogonal FSK with coherent 
detection, where the parameter // is debned as 


ri = 



(13.4-21) 


Furthermore, for large values of y c , the performance Pj, can be approximated as 


Pb ^ 



(13.4-22) 


In comparing Equation 13.4-22 with Equation 13.4-18, we observe that the 3-dB 
difference in performance between PSK and orthogonal FSK with coherent detection, 
which exists in a nonfading, nondispersive channel, is the same also in a fading channel. 

In the above discussion of binary PSK and FSK, detected coherently, we assumed 
that noiseless estimates of the complex- valued channel parameters {a^e 7 ^ 4 } were used 
at the receiver. Since the channel is time-variant, the parameters { ; ^ 4 } cannot be 
estimated perfectly. In fact, on some channels, the time variations may be sufficiently 
fast to preclude the implementation of coherent detection. In such a case, we should 
consider using either DPSK or FSK with noncoherent detection. 
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Let us consider DPSK first. In order for DPSK to be a viable digital signaling 
method, the channel variations must be sufficiently slow so that the channel phase 
shifts {</><.} do not change appreciably over two consecutive signaling intervals. In our 
analysis, we assume that the channel parameters {a k e ^ k } remain constant over two 
successive signaling intervals. Thus the combiner for binary DPSK will yield as an 
output the decision variable 


U = Re 


E ( 2£a k e j 01 + N k2 ) (2Sa k e~^ + N^) 


k= 1 


(13.4-23) 


where { N k \ } and {N k2 } denote the received noise components at the output of the 
matched filters in the two consecutive signaling intervals. The probability of error is 
simply the probability that U < 0. Since U is a special case of the general quadratic fonn 
in complex-valued Gaussian random variables treated in Appendix B, the probability 
of error can be obtained directly from the results given in that appendix. Alternatively, 
we may use the error probability given in Equation 11.1-13, which applies to binary 
DPSK transmitted over L time-invariant channels, and average it over the Rayleigh 
fading channel statistics. Thus, we have the conditional error probability 


L—l 

Pb(Yb) = (\) 2L -'e-y>’Y^ b k Yb 

k = 0 

where Yb is given by Equation 13.4-9 and 



(13.4-24) 


(13.4-25) 


The average of Pb(yb) over the fading channel statistics given by /)(]//,) in Equa- 
tion 13.4-13 is easily shown to be 


P h 


2 1L ~\L 


1 

!)'(!+ V C ) L 


L—l 

J2b k (L-l+k)\ 

k=0 



(13.4-26) 


We indicate that the result in Equation 13.4-26 can be manipulated into the form given 
in Equation 13.4-15, which applies also to coherent PSK and FSK. For binary DPSK, 
the parameter /i in Equation 13.4-15 is defined as (see Appendix C) 


H = (13.4-27) 

1 + Y C 

For y c 1, the error probability in Equation 13.4-26 can be approximated by the 
expression 


Pb 





(13.4-28) 


Orthogonal FSK with noncoherent detection is the final signaling technique that 
we consider in this section. It is appropriate for both slow and fast fading. However, 
the analysis of the performance presented below is based on the assumption that the 
fading is sufficiently slow so that the channel parameters {a k e^ k } remain constant for 
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the duration of the signaling interval. The combiner for the multichannel signals is a 
square-law combiner. Its output consists of the two decision variables 


where U \ is assumed to contain the signal. Consequently the probability of error is the 
probability that U 2 > U\. 

As in DPSK, we have a choice of two approaches in deriving the performance of 
FSK with square-law combining. In Section 1 1.1, we indicated that the expression for 
the error probability for square-law-combined FSK is the same as that for DPSK with 
Yb replaced by \yi>- That is, the FSK system requires 3 dB of additional SNR to achieve 
the same performance on a time-invariant channel. Consequently, the conditional error 
probability for DPSK given in Equation 13.4-24 applies to square-law-combined FSK 
when yb is replaced by \yb- Furthermore, the result obtained by averaging Equa- 
tion 13.4—24 over the fading, which is given by Equation 13.4-26, must also apply to 
FSK with y c replaced by \y c - But we also stated previously that Equations 13.4-26 
and 13.4-15 are equivalent. Therefore, the error probability given in Equation 13.4-15 
also applies to square-law-combined FSK with the parameter fi defined as 

H = Vc _ (13.4-30) 

2 + y c 

An alternative derivation used by Pierce (1958) to obtain the probability that the 
decision variable U 2 > U\ is just as easy as the method described above. It begins with 
the probability density functions p{u\) and piun). Since the complex-valued random 
variables {o^e^t}, {N k i }, and {N k 2 } are zero-mean Gaussian-distributed, the decision 
variables U\ and Uo are distributed according to a chi-square probability distribution 
with 2 L degrees of freedom. That is, 


L 


\2£a k e** + A(ti| 2 


k=l 

L 


(13.4-29) 


U 2 = Y J \ N k2\ 2 


k= 1 



(13.4-31) 


where 


af = \E (\2£a k e~^ k + N kl \ 2 ) 
= 2£A 0 (l + y c ) 


Similarly, 



(13.4-32) 


a\ = 2£A 0 


where 
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The probability of error is just the probability that U% > U\. It is left as an exercise 
for the reader to show that this probability is given by Equation 13.4—15, where /i is 
defined by Equation 13.4-30. 

When y c 1, the performance of square-law-detected FSK can be simplified as 
we have done for the other binary multichannel systems. In this case, the error rate is 
well approximated by the expression 


Pb 



(13.4-33) 


The error rate performance of PSK, DPSK, and square-law-detected orthogonal 
FSK is illustrated in Figure 13.4—2 for L = 1,2, and 4. The performance is plotted as 
a function of the average SNR per bit, y b , which is related to the average SNR per 
channel, y c , by the formula 


Y b = l Yc 


(13.4-34) 
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FIGURE 13.4-2 

Performance of binary signals with diversity. 


Chapter Thirteen: Fading Channels I: Characterization and Signaling 


859 


The results in Figure 13.4-2 clearly illustrate the advantage of diversity as a means for 
overcoming the severe penalty in SNR caused by fading. 

Nakagami fading It is a simple matter to extend the results of this section to 
other fading models. We shall briefly consider Nakagami fading. Let us compare the 
Nakagami PDF for the single-channel SNR parameter y b = a 2 Sb /No, previously given 
by Equation 13.3-14 as 


P(Yb) = 


1 


^(m)(y b /m) m 


yj n ~ l e~ Yb,( ' Yb,m) 


(13.4-35) 


with the PDF p{y b ) obtained for the L -channel SNR with Rayleigh fading, given by 
Equation 13.4-13 as 


p(Yb) = rYh YblYc (13.4-36) 

(L-\)\y L / b v 

By noting that y c = y b /L in the case of an Lth order diversity system, it is clear 
that the two PDFs are identical for L = m = integer. When L = m = 1, the two 
PDFs correspond to a single channel Rayleigh fading system. For the case in which 
the Nakagami parameter m = 2, the performance of the single-channel system is 
identical to the performance obtained in a Rayleigh fading channel with dual ( L = 2) 
diversity. More generally, any single-channel system with Nakagami fading in which 
the parameter m is an integer, is equivalent to an L-channel diversity system for a 
Rayleigh fading channel. In view of this equivalence, the characteristic function of a 
Nakagami-m random variable must be of the form 


0 Vft (v) = l — 

(1 - jvy b /m) in 


(13.4-37) 


which is consistent with the result given in Equation 13.4-12 for the characteristic 
function of the combined signal in a system with Lth-order diversity in a Rayleigh 
fading channel. Consequently, it follows that a K -channel system transmitting in a 
Nakagami fading channel with independent fading is equivalent to an L = Km channel 
diversity in a Rayleigh fading channel. 


13.4-2 Multiphase Signals 

Multiphase signaling over a Rayleigh fading channel is the topic presented in some 
detail in Appendix C. Our main purpose in this section is to cite the general result for 
the probability of a symbol error in M- ary PSK and DPSK systems and the probability 
of a bit error in four-phase PSK and DPSK. 
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The general result for the probability of a symbol error in M - ary PSK and DPSK is 


P, = 



f aL_1 

1 [ 

n(L - 1)! 

l db L ~ l 

_b — [i 2 


li sm{n/M) 


cot 


1 ) 

M 


—fi cos(it/M ) 


\Jb — /I 2 cos 2 (7r / M) \Jb — ji 2 cos 2 (7 t/M). 


(13.4-38) 


6=1 


where 


for coherent PSK and 



M = 


Yc 

1 + Y c 


(13.4-39) 


(13.4-40) 


for DPSK. Again y c is the average received SNR per channel. The SNR per bit is 
y b = Ly c /k, where k = log 2 M. 

The bit error rate for four-phase PSK and DPSK is derived on the basis that the 
pair of information bits is mapped into the four phases according to a Gray code. The 
expression for the bit error rate derived in Appendix C is 


P "=2 



(13.4-41) 


where /i is again given by Equations 13.4-39 and 13.4-40 for PSK and DPSK, 
respectively. 

Figure 13.4-3 illustrates the probability of a symbol error of DPSK and coherent 
PSK for M = 2,4, and 8 with L = 1 . Note that the difference in performance between 
DPSK and coherent PSK is approximately 3 dB for all three values of M . In fact, when 
y b ^> 1 and L = 1, Equation 13.4—38 is well approximated as 


for DPSK and as 


M - 1 

(M log 2 M)[sin 2 (7r/M )\y h 


M - 1 

(M log 2 M)[sm 2 (7T/M)]2y b 


(13.4-42) 


(13.4-43) 


for PSK. Hence, at high SNR, coherent PSK is 3 dB better than DPSK on a Rayleigh 
fading channel. This difference also holds as L is increased. 

Bit error probabilities are depicted in Figure 13.4-4 for two-phase, four-phase, 
and eight-phase DPSK signaling with L = 1,2, and 4. The expression for the bit 
error probability of eight-phase DPSK with Gray encoding is not given here, but it is 
available in the paper by Proakis (1968). In this case, we observe that the performances 
for two- and four-phase DPSK are (approximately) the same, while that for eight-phase 
DPSK is about 3 dB poorer. Although we have not shown the bit error probability for 
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FIGURE 13.4-3 

Probability of symbol error for PSK and DPSK for Rayleigh fading. 


coherent PSK, it can be demonstrated that two- and four-phase coherent PSK also yield 
approximately the same performance. 


13.4-3 M - ary Orthogonal Signals 

In this subsection, we determine the performance of M -ary orthogonal signals trans- 
mitted over a Rayleigh fading channel and we assess the advantages of higher-order 
signal alphabets relative to a binary alphabet. The orthogonal signals may be viewed as 
M - ary FSK with a minimum frequency separation of an integer multiple of 1 /T, where 
T is the signaling interval. The same information-bearing signal is transmitted on L 
diversity channels. Each diversity channel is assumed to be frequency-nonselective and 
slowly fading, and the fading processes on the L channels are assumed to be mutually 
statistically independent. An additive white Gaussian noise process corrupts the signal 
on each diversity channel. We assume that the additive noise processes are mutually 
statistically independent. 
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FIGURE 13.4-4 

Probability of a bit error for DPSK with diversity for Rayleigh fading. 


Although it is relatively easy to formulate the structure and analyze the performance 
of a maximal ratio combiner for the diversity channels in the M - ary communication 
system, it is more likely that a practical system would employ noncoherent detection. 
Consequently, we confine our attention to square-law combining of the diversity signals. 
The output of the combiner containing the signal is 

L 

\2£a k e i<l,k + N ki | 2 (13.4-44) 

k=\ 

while the outputs of the remaining M — 1 combiners are 

L 

Um = m = 2,3,4,..., M (13.4-45) 

k=\ 

The probability of error is simply 1 minus the probability that U\ > U m for m = 
2, 3, . . . , M. Since the signals are orthogonal and the additive noise processes are mu- 
tually statistically independent, the random variables U\, tA, . . . , Cm are also mutually 
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statistically independent. The probability density function of U\ was given in Equa- 
tion 13.4—31. On the other hand, U 2 , ■ ■ ■ , Um are identically distributed and described 
by the marginal probability density function in Equation 13.4—32. With U\ bxed, the 
joint probability P{\Ji < JJ\, U 3 < U\, . . . ,U m < U\) is equal to P(U 2 < U\) raised 
to the M — 1 power. Now, 


P(U 2 < u 1 I Ui = Ml) = 


p(ui) dll 2 


1 ( 11 i \ v— ' 1 ( u \ 


2 7 k= 0 


(13.4-46) 


where 07 = 2£Nq. The M — 1 power of this probability is then averaged over the 
probability density function of U\ to yield the probability of a correct decision. If we 
subtract this result from unity, we obtain the probability of error in the form given by 
Hahn (1962) 


P,, = \ — 


1 L — 1 ( W 1 

L ~^ — Ml “Pi " 2 ^ 

M— 1 


'« (2„‘j ( L - D! 




1 / n 


2 7 k =0 


= 1 - 


1 


Jo (1 +7 c r( L - 1 ) ! 

M - 1 

I du 1 


£! \ 2 m 


wf 1 exp 


du\ 
u 1 

1 + Kc 


L-l /fc ' 

n, 


‘-■"Sir 


*=o 


(13.4-47) 


where y c is the average SNR per diversity channel. The average SNR per bit is y b = 
L 7 d log 2 M = Ly c / k. 

The integral in Equation 13.4^17 can be expressed in closed form as a double 
summation. This can be seen if we write 

(L-\ k \ m ML- 1 ) 

Et 7 = E fomu\ (13.4-48) 

\k=0 K ' ) k= 0 

where is the set of coefficients in the above expansion. Then it follows that Equa- 
tion 13.4^17 reduces to 


Pp = 


1 ') 


(L - 1 )! ^ (1 + m + my c ) L 

m(L-l) 


E Pkm(L~ l+*)!( ~ 

k= 0 ' 1 


1 + Yc 


+ m + my ( 


(13.4-49) 
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When there is no diversity ( L = 1), the error probability in Equation 13.4—49 reduces 
to the simple form 


M-l (-1)" !+1 ( M ~ 

p e = J2— [ m J 


m= 1 


+ m + my c 


(13.4-50) 


The symbol error rate P. may be converted to an equivalent bit error rate by multiplying 
P e with 2 k ~ l /(2 k - 1). 

Although the expression for P e given in Equation 13.4—49 is in closed form, it is 
computationally cumbersome to evaluate for large values of M and L. An alternative 
is to evaluate Pm by numerical integration using the expression in Equation 1 3 .4 — 47 . 
The results illustrated in the following graphs were generated from Equation 1 3.4— 47. 

First of all, let us observe the error rate performance of M - ary orthogonal signaling 
with square-law combining as a function of the order of diversity. Figures 13.4-5 and 
13.4-6 illustrate the characteristics of P e for M = 2 and 4 as a function of L when the 
total SNR, defined as )/, = Ly c , remains fixed. These results indicate that there is an 
optimum order of diversity for each y t . That is, for any y t , there is a value of L for 
which P e is a minimum. A careful observation of these graphs reveals that the minimum 
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FIGURE 13.4-5 

Performance of square-law-detected 
binary orthogonal signals as a function 
of diversity. 
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FIGURE 13.4-6 

Performance of square-law-detected 
M = 4 orthogonal signals as a 
function of diversity. 
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Order of diversity, L 


in P e is obtained when y c = y t /L ~ 3. This result appears to be independent of the 
alphabet size M. 

Second, let us observe the error rate P e as a function of the average SNR per bit, 
defined as y b = Ly c /k. (If we interpret M - ary orthogonal FSK as a form of coding 
and the order of diversity as the number of times a symbol is repeated in a repetition 
code, then y b = y c /R c , where R, = k/L is the code rate.) The graphs of P e versus 
y b for M = 2, 4, 8, 16, 32 and L = 1, 2, 4 are shown in Figure 13.4-7. These results 
illustrate the gain in performance as M increases and L increases. First, we note that a 
significant gain in performance is obtained by increasing L. Second, we note that the 
gain in performance obtained with an increase in M is relatively small when L is small. 
However, as L increases, the gain achieved by increasing M also increases. Since an 
increase in either parameter results in an expansion of bandwidth, i.e., 


LM 
log 2 M 


(13.4-51) 


the results illustrated in Figure 13.4-7 indicate that an increase in L is more efficient than 
a corresponding increase in M. As we shall see in Chapter 14, coding is a bandwidth- 
effective means for obtaining diversity in the signal transmitted over the fading channel. 
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FIGURE 13.4-7 

Performance of orthogonal signaling with M and L as parameters. 


Chernov bound Before concluding this section, we develop a Chernov upper 
bound on the error probability of binary orthogonal signaling with Lth-order diver- 
sity, which will be useful in our discussion of coding for fading channels, the topic 
of Chapter 14. Our starting point is the expression for the two decision variables U\ 
and U 2 given by Equation 13.4-29, where U \ consists of the square-law-combined 
signal-plus-noise terms and U 2 consists of square-law-combined noise terms. The bi- 
nary probability of error, denoted here by Pb(L), is 


Pb(L) = P(U 2 -U t > 0) 

/•OO 

= P(X > 0) = / p(x)dx 


where the random variable X is defined as 

L 


X = U 2 -U l = Y J {\Nn \ 2 - 12 Sa k + N k i| 2 ) 


k= 1 


(13.4-52) 


(13.4-53) 
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The phase terms { </>* } in U\ have been dropped since they do not affect the performance 
of the square-law detector. 

Using the Chernov bound, the error probability in 13.4-52 can be expressed in the 
form 


P b (L) < E(e^ x ) 


(13.4-54) 


where the parameter f > 0 is optimized to yield a tight bound. Upon substituting for 
the random variable X from Equation 13.4-53 and noting that the random variables in 
the summation are mutually statistically independent, we obtain the result 


But 


and 


P b (L) < U E (V |A, “ |2 ) E (g-f I2***+Afcil 2 ) 

k= 1 
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1 — 2^(72 ’ 



£ ^ e -S\2£a k +N tl \ 


1 

1 + crp 


? > 


-1 

2cr 2 


(13.4-55) 


(13.4-56) 


(13.4-57) 


where cr 2 = 2£Nq, cs\ = 2£No( I + y c ), and y c is the average SNR per diversity 
channel. Note that o\ and cr 2 are independent of k, i.e., the additive noise terms on 
the L diversity channels as well as the fading statistics are identically distributed. 
Consequently, Equation 13.4-55 reduces to 


Pb(L) < 


(1 -2t;ol) (1 +2for 2 ) 




(13.4-58) 


By differentiating the right-hand side of Equation 13.4—58 with respect to £, we 
find that the upper bound is minimized when 


? = 


a l a 2 

4CTfCr 2 2 


(13.4-59) 


Substitution of Equation 13.4—59 for f into Equation 13.4-58 yields the Chernov upper 
bound in the form 


Pb(L) < 


4(l + y c ) l L 
L (2 + Yc ) 2 - 


(13.4-60) 


It is interesting to note that Equation 13.4-60 may also be expressed as 

Pb(L) < [4p(l - p)] L 


(13.4-61) 
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FIGURE 13.4-8 

Comparison of Chernov bound with exact 
error probability. 


where p = 1 /(2 + y c ) is the probability of error for binary orthogonal signaling on a 
fading channel without diversity. 

A comparison of the Chernov bound in Equation 13.4-60 with the exact error 
probability for binary orthogonal signaling and square-law combining of the L diversity 
signals, which is given by the expression 


Pb(L) = 




(13.4-62) 


reveals the tightness of the bound. Figure 13.4-8 illustrates this comparison. We observe 
that the Chernov upper bound is approximately 6 dB from the exact error probability 
for L = 1, but, as L increases, it becomes tighter. For example, the difference between 
the bound and the exact error probability is about 2.5 dB when L = 4. 

Finally we mention that the error probability for M-axy orthogonal signaling with 
diversity can be upper-bounded by means of the union bound 


Pe<(M- 1 )P 2 (L) 


(13.4-63) 


where we may use either the exact expression given in Equation 13.4-62 or the Chernov 
bound in Equation 13.4-60 for Pb(L). 
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■ 13.5 

SIGNALING OVER A FREQUENCY-SELECTIVE, SLOWLY FADING 
CHANNEL: THE RAKE DEMODULATOR 

When the spread factor of the channel satisfies the condition T m Bj <<C 1 , it is possible to 
select signals having a bandwidth W <<C (A f) c and a signal duration T <C (A t) c . Thus, 
the channel is frequency-nonselective and slowly fading. In such a channel, diversity 
techniques can be employed to overcome the severe consequences of fading. 

When a bandwidth W >> (A f) c is available to the user, the channel can be subdi- 
vided into a number of frequency-division multiplexed (FDM) subchannels having a 
mutual separation in center frequencies of at least (A f) c . Then the same signal can be 
transmitted on the FDM subchannels, and, thus, frequency diversity is obtained. In this 
section, we describe an alternative method. 


13.5-1 A Tapped-Delay-Line Channel Model 

As we shall now demonstrate, a more direct method for achieving basically the same 
results is to employ a wideband signal covering the bandwidth W. The channel is 
still assumed to be slowly fading by virtue of the assumption that T <$C (A t) c . Now 
suppose that W is the bandwidth occupied by the real band-pass signal. Then the 
band occupancy of the equivalent low-pass signal si(t) is \ f\ < \W. Since si(t) is 
band- limited to |/| < i W , application of the sampling theorem results in the signal 
representation 


The noiseless received signal from a frequency-selective channel was previously 
expressed in the form 



( 13 . 5 - 1 ) 


The Fourier transform of s/(t) is 



I/I < 

I/I > 5^ 


( 13 . 5 - 2 ) 



( 13 . 5 - 3 ) 
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where C(/; t) is the time -variant transfer function. Substitution for Si(f') from Equa- 
tion 13.5-2 into 13.5-3 yields 


where c(r; t) is the time-variant impulse response. We observe that Equation 13.5-4 
has the form of a convolution sum. Hence, it can also be expressed in the alternative 
form 


Then Equation 13.5-5 expressed in terms of these channel coefficients becomes 


The form for the received signal in Equation 13.5-7 implies that the time- variant 
frequency-selective channel can be modeled or represented as a tapped delay line with 
tap spacing 1/W and tap weight coefficients {c, ,(?)}• In fact, we deduce from Equa- 
tion 13.5-7 that the low-pass impulse response for the channel is 


Thus, with an equivalent low-pass-signal having a bandwidth I W , where W 7>> (A f) c , 
we achieve a resolution of 1/ W in the multipath delay profile. Since the total multipath 
spread is T m , for all practical purposes the tapped delay line model for the channel 
can be truncated at L = |_ T m W\ + 1 taps. Then the noiseless received signal can be 
expressed in the form 



(13.5-4) 




(13.5-5) 


It is convenient to define a set of time-variable channel coefficients as 



(13.5-6) 


OO 



(13.5-7) 


OO 



(13.5-8) 


n=—o o 


and the corresponding time-variant transfer function is 


OO 



(13.5-9) 


n=—o o 



(13.5-10) 
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FIGURE 13.5-1 

Trapped delay line model of frequency-selective channel. 


The truncated tapped delay line model is shown in Figure 13.5-1. In accordance 
with the statistical characterization of the channel presented in Section 13.1, the time- 
variant tap weights {c„(0} are complex-valued stationary random processes. In the spe- 
cial case of Rayleigh fading, the magnitudes |c„(f)| = a,,(t) are Rayleigh-distributed 
and the phases are uniformly distributed. Since the {c„(0} represent the tap 

weights corresponding to the L different delays r = n/W,n = 1,2, . . . , L, the uncor- 
related scattering assumption made in Section 13.1 implies that the {c„(t)\ are mutually 
uncorrelated. When the {c„(t)\ are Gaussian random processes, they are statistically 
independent. 


13.5-2 The RAKE Demodulator 

We now consider the problem of digital signaling over a frequency-selective channel 
that is modeled by a tapped delay line with statistically independent time- variant tap 
weights {c, ,(?)}• It is apparent at the outset, however, that the tapped delay line model 
with statistically independent tap weights provides us with L replicas of the same 
transmitted signal at the receiver. Hence, a receiver that processes the received signal in 
an optimum manner will achieve the performance of an equivalent Lth-order diversity 
communication system. 

Let us consider binary signaling over the channel. We have two equal-energy 
signals sn(t) and sn(t), which are either antipodal or orthogonal. Their time duration T 
is selected to satisfy the condition T T m . Thus, we may neglect any intersymbol 
interference due to multipath. Since the bandwidth of the signal exceeds the coherent 
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bandwidth of the channel, the received signal is expressed as 


L 


n (0 = ^2 C ki t )^li ( J -k/W) + z(t ) 


(13.5-11) 


k= 1 


= 11,(0 + z(0, 0 < t < T, i = 1,2 


where z(t) is a complex-valued zero-mean white Gaussian noise process. Assume for 
the moment that the channel tap weights are known. Then the optimum demodulator 
consists of two filters matched to m(r) and The demodulator output is sampled at 
the symbol rate and the samples are passed to a decision circuit that selects the signal 
corresponding to the largest output. An equivalent optimum demodulator employs 
cross correlation instead of matched filtering. In either case, the decision variables for 
coherent detection of the binary signals can be expressed as 


Figure 13.5-2 illustrates the operations involved in the computation of the decision 
variables. In this realization of the optimum receiver, the two reference signals are 
delayed and correlated with the received signal r/(f). 

An alternative realization of the optimum demodulator employs a single delay line 
through which is passed the received signal /y ( / ) . The signal at each tap is correlated 
with where k = 1, 2, . . . , L and m = 1, 2. This receiver structure is shown 

in Figure 13.5-3. In effect, the tapped delay line demodulator attempts to collect the 
signal energy from all the received signal paths that fall within the span of the delay 
line and carry the same information. Its action is somewhat analogous to an ordinary 
garden rake and, consequently, the name “RAKE demodulator” has been coined for this 
demodulator structure by Price and Green (1958). The taps on the RAKE demodulator 
are often called “RAKE fingers.” 


13.5-3 Performance of RAKE Demodulator 

We shall now evaluate the performance of the RAKE demodulator under the condition 
that the fading is sufficiently slow to allow us to estimate c>(t) perfectly (without noise). 
Furthermore, within any one signaling interval, q( 0 is treated as a constant and denoted 
as Ck- Thus the decision variables in Equation 13.5-12 may be expressed in the form 



T 


(13.5-12) 


_k= 1 



T 


r(t)s* m (t — k/W) dt , m = 1,2 (13.5-13) 


Chapter Thirteen: Fading Channels I: Characterization and Signaling 


873 



FIGURE 13.5-2 

Optimum demodulator for wideband binary signals (delayed reference configuration). 


Suppose the transmitted signal is Sn(t)', then the received signal is 

L 

r l (t) = Y J C n sn{t-n/W) + z{t), 0<t<T (13.5-14) 

n=\ 

Substitution of Equation 13.5-14 into Equation 13.5-13 yields 

J t J t j 

U m = Re 4 E c « / - n/ W)s'*Jt - k/ W) dt 


k= 1 n = 1 

l „r 


Re 


tclf z(t)s* m (t — k/ W) dt 
1=1 Jo 


(13.5-15) 


m =1,2 
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FIGURE 13.5-3 

Optimum demodulator for wideband binary signals (delayed received signal configuration). 


Usually the wideband signals sn(t) and spit) are generated from pseudorandom 
sequences, which result in signals that have the property 

f s H (t -n/W)sf t (t -k/W)dt ^0, k^n, i = 1,2 (13.5-16) 

Jo 

If we assume that our binary signals are designed to satisfy this property, then Equa- 
tion 13.5-15 simplifies to^ 


U m = Re 


+ Re 


V |C ,| 2 f T s n (t - k/WXJt - k/W)dt 

,k=l Jo 

Lj J 1 

y>* / z(tx m (t-k/w)dt 

Jo 


(13.5-17) 


m = 1,2 


tAlthough the orthogonality property specified by Equation 13.5-16 can be satisfied by proper selection 
of the pseudorandom sequences, the cross correlation of Sp(t — n/W) with sT,(t — k/W) gives rise to a 
signal-dependent self-noise, which ultimately limits the performance. For simplicity, we do not consider 
the self-noise term in the following calculations. Consequently, the performance results presented below 
should be considered as lower bounds (ideal RAKE). An approximation to the performance of the RAKE 
can be obtained by treating the self-noise as an additional Gaussian noise component with noise power 
equal to its variance. 
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When the binary signals are antipodal, a single decision variable suffices. In this 
case, Equation 13.5-17 reduces to 


U i = Re 


L L 

28 ^2 a k + 0lk ^ k 
k= 1 k=\ 


where a k = |q| and 


N k = e~ 34>k 



k/W)dt 


(13.5-18) 


(13.5-19) 


But Equation 13.5-18 is identical to the decision variable given in Equation 13.4-4, 
which corresponds to the output of a maximal ratio combiner in a system with Lth-order 
diversity. Consequently, the RAKE demodulator with perfect (noiseless) estimates of 
the channel tap weights is equivalent to a maximal ratio combiner in a system with 
Lth-order diversity. Thus, when all the tap weights have the same mean-square value, 
i.e., E(aj) is the same for all k. the error rate performance of the RAKE demodulator 
is given by Equations 13.4-15 and 13.4-16. On the other hand, when the mean-square 
values E{a\) are not identical for all k. the derivation of the error rate performance 
must be repeated since Equation 13.4-15 no longer applies. 

We shall derive the probability of error for binary antipodal and orthogonal signals 
under the condition that the mean-square values of {c^ } are distinct. We begin with the 
conditional error probability 


Pb(Vb) = Q (y/Ybd-Pr)) (13.5-20) 

where p r = — 1 for antipodal signals, p r = 0 for orthogonal signals, and 

£ L L 

Yb= -vrJ2 a k =J2 Y k (13.5-21) 

N 0 k = 1 k= 1 


Each of the { y k ) is distributed according to a chi-squared distribution with two 
degrees of freedom. That is. 


p(Yk)= (13.5-22) 

Yk 

where y k is the average SNR for the L t li path, defined as 

Yk = Y q E K 2 ) (13.5-23) 

Furthermore, from Equation 13.4—10 we know that the characteristic function of y k is 

1 

1 -JVYk 




(13.5-24) 
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Since y b is the sum of L statistically independent components {]//,}. the character- 
istic function of yb is 


T '■ - (13.5-25) 

*=i ] ~J v Yk 

The inverse Fourier transform of the characteristic function in Equation 13.5-25 yields 
the probability density function of y b in the form 


L 

P(Yb) = y2zre~ Yb/Yk , y b > 0 
Yk 

where is defined as 


L 

"* = 11 
i = 1 


Yk 

Yk ~ Yi 


(13.5-26) 


(13.5-27) 


When the conditional error probability in Equation 13.5-20 is averaged over the 
probability density function given in Equation 13.5-26, the result is 

L 
k=\ 

This error probability can be approximated as (y k y> 1 ) 


1 - 


I Yk^-Pr) 

2 + y k (l - p r ) 


(13.5-28) 


2 L - 1 
L 


n — - — 

t\ 2 7*( 1 - Pr) 


(13.5-29) 


By comparing Equation 13.5-29 for p r = — 1 with Equation 13.4—18, we observe that 
the same type of asymptotic behavior is obtained for the case of unequal SNR per path 
and the case of equal SNR per path. 

In the derivation of the error rate performance of the RAKE demodulator, we 
assumed that the estimates of the channel tap weights are perfect. In practice, relatively 
good estimates can be obtained if the channel fading is sufficiently slow, e.g., (A t)J T > 
100, where T is the signaling interval. Figure 13.5-4 illustrates a method for estimating 
the tap weights when the binary signaling waveforms are orthogonal. The estimate is 
the output of the low-pass filter at each tap. At any one instant in time, the incoming 
signal is either sn(t) or sn(t). Hence, the input to the low-pass filter used to estimate 
Ck(t) contains signal plus noise from one of the correlators and noise only from the 
other correlator. This method for channel estimation is not appropriate for antipodal 
signals, because the addition of the two correlator outputs results in signal cancellation. 
Instead, a single correlator can be employed for antipodal signals. Its output is fed 
to the input of the low-pass filter after the information-bearing signal is removed. To 
accomplish this, we must introduce a delay of one signaling interval into the channel 
estimation procedure, as illustrated in Figure 13.5-5. That is, first the receiver must 
decide whether the information in the received signal is + 1 or — 1 and, then, it uses the 
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To summer 
and integrator 


To summer 
and integrator 


FIGURE 13.5-4 

Channel tap weight estimation with binary orthogonal signals. 


decision to remove the information from the correlator ouput prior to feeding it to the 
low-pass filter. 

If we choose not to estimate the tap weights of the frequency-selective channel, we 
may use either DPSK signaling or noncoherently detected orthogonal signaling. The 
RAKE demodulator structure for DPSK is illustrated in Figure 13.5-6. It is apparent that 
when the transmitted signal waveform s/(f) satisfies the orthogonality property given in 
Equation 13.5-16, the decision variable is identical to that given in Equation 13.4-23 for 
an Lth-order diversity system. Consequently, the error rate performance of the RAKE 
demodulator for a binary DPSK is identical to that given in Equation 13.4-15 with 
fi = y c /(] + y c ), when all the signal paths have the same SNR y c . On the other hand, 
when the SNRs {y k } are distinct, the error probability can be obtained by averaging 
Equation 13.4-24, which is the probability of error conditioned on a time-invariant 
channel, over the probability density function of y/, given by Equation 13.5-26. The 
result of this integration is 


P b 


G) 


2L— 1 




m = 0 k= 1 


jtk 

Yk 


Yk 

1 + Yk 


m+1 


(13.5-30) 


where jc k is defined in Equation 13.5-27 and b m in Equation 13.4-25. 

Finally, we consider binary orthognal signaling over the frequency-selective chan- 
nel with square-law detection at the receiver. This type of signal is appropriate when 
the fading is rapid enough to preclude a good estimate of the channel tap weights. 
The RAKE demodulator with square-law combining of the signal from each tap is 
illustrated in Figure 13.5-7. In computing its performance, we again assume that the 
orthogonality property given in Equation 13.5-16 holds. Then the decision variables at 
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FIGURE 13.5-5 

Channel tap weight estimation with binary antipodal signals. 



Decision variable 


FIGURE 13.5-6 

RAKE demodulator for DPSK signals. 
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FIGURE 13.5-7 

RAKE demodulator for square-law combination of orthogonal signals. 


the output of the RAKE are 


L 

U l= J2 12 Set + N kl \ 2 

k= 1 
L 

Ul = J2 l^2| 2 

k= 1 


(13.5-31) 


where we have assumed that .9/i(f) was the transmitted signal. Again we observe that the 
decision variables are identical to the ones given in Equation 13.4-29, which apply to 
orthogonal signals with Lth-order diversity. Therefore, the performance of the RAKE 
demodulator for square-law -detected orthogonal signals is given by Equation 13.4-15 
with /i = Yc /{ 2 + y~) when all the signal paths have the same SNR. If the SNRs are 
distinct, we can average the conditional error probability given by Equation 13.4-24, 
with Yb replaced by \y k . over the probability density function piyt,) given in Equa- 
tion 13.5-26. The result of this averaging is given by Equation 13.5-30, with y k replaced 

by \y k - 

In the above analysis, the RAKE demodulator shown in Figure 13.5-7 for square- 
law combining of orthogonal signals is assumed to contain a signal component at each 
delay. If that is not the case, its performance will be degraded, since some of the tap 
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correlators will contribute only noise. Under such conditions, the low-level, noise-only 
contributions from the tap correlators should be excluded from the combiner, as shown 
by Chyi et al. (1988). 

The configurations of the RAKE demodulator presented in this section can be 
easily generalized to multilevel signaling. In fact, if M - ary PSK or DPSK is chosen, 
the RAKE structures presented in this section remain unchanged. Only the PSK and 
DPSK detectors that follow the RAKE correlator are different. 

Generalized RAKE Demodulator 

The RAKE demodulator described above is the optimum demodulator when the ad- 
ditive noise is white and Gaussian. However, there are communication scenarios in 
which additive interference from other users of the channel results in colored additive 
noise. This is the case, for example, in the downlink of a cellular communication sys- 
tem employing CDMA as a multiple access method. In this case, the spread spectrum 
signals transmitted from a base station to the mobile receivers carry information on 
synchronously transmitted orthogonal spreading codes. However, in transmission over 
a frequency-selective channel, the orthogonality of the code sequences is destroyed by 
the channel time dispersion due to multipath. As a consequence, the RAKE demodu- 
lator for any given mobile receiver must demodulate its desired signal in the presence 
of additional additive interference resulting from the cross-correlations of its desired 
spreading code sequence with the multipath corrupted code sequences that are assigned 
to the other mobile users. This additional interference is generally characterized as col- 
ored Gaussian noise, as shown by Bottomley (1993) and Klein (1997). 

A model for the downlink transmission in a CDMA cellular communication system 
is illustrated in Figure 13.5-8. The base station transmits the combined signal. 


to the K mobile terminals, where each s^it) is a spread spectrum signal intended for the 
kth user and the corresponding spreading code for the Ath user is orthogonal with each 
of the spreading codes of the other K — 1 users. We assume that the signals propagate 
through a channel characterized by the baseband equivalent lowpass, time-invariant 


K 



(13.5-32) 



Channel 



Base station 


AWGN 


FIGURE 13.5-8 

Model for the downlink transmission of a CDMA cellular communication system. 
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FIGURE 13.5-9 

Structure of generalized RAKE demodulator. 


impulse response 

L k 

c k (r) = J2 c ki8(r -r ki ), k = 1,2, K (13.5-33) 

i=i 

where L k is the number of resolvable multipath components, {c kl } are the complex- 
valued coefficients, and {x ki } are the corresponding time delays. To simplify this pre- 
sentation, we focus on the processing at the receiver of the first user (k = I ) and drop 
the index k. In a CDMA cellular system, an unmodulated spread spectrum signal, say 
so(t), is transmitted along with the information-bearing signals and serves as a pilot 
signal that is used by each mobile receiver to estimate the channel coefficients {c, } and 
the time delays {r,}. 

A conventional RAKE demodulator would consist of L “fingers” with each finger 
corresponding to one of the L channel delays, and the weights at the L fingers would be 
{c*}, the complex conjugates of the corresponding channel coefficients. In contrast, a 
generalized RAKE demodulator consists of L g > L RAKE fingers, and the weights at 
the L g fingers, denoted as ( w, } , are different from {c*}. The structure of the generalized 
RAKE demodulator is illustrated in Figure 13.5-9 for phase coherent modulation such 
as PSK or QAM. The decision variable U at the detector may be expressed as 

U = w H y (13.5-34) 

It is convenient to express the received vector y at the output of the cross- 
correlators as 


y = gb + z (13.5-35) 

where g is a vector of complex-valued elements which result from the cross-correlations 
of the desired received signal, say si(t) * c\(t), with the corresponding spreading se- 
quence at the L g delays, b is the desired symbol to be detected, and z represents the 
vector of additive Gaussian noise plus interference resulting from the cross-correlations 
of the spreading sequence with the received signals of the other users and intersymbol 
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interference due to channel multipath. For a sufficiently large number of users and 
channel multipath components, the vector z may be characterized as complex- valued 
Gaussian with zero mean and covariance matrix R- = E[zz H ]. Based on this statis- 
tical characterization of z, the RAKE finger weight vector for maximum-likelihood 
detection is given as 


w = R- l g (13.5-36) 

Given the channel impulse response, the implementation of the maximum-likelihood 
detector requires the evaluation of the covariance matrix R z and the desired signal vec- 
tor g. The procedure for evaluation of these parameters has been described in a paper 
by Bottomley et al. (2000). Also investigated in this paper is the selection of the number 
of RAKE fingers and the selection of the corresponding delays for different channel 
characteristics. 

In the description of the generalized RAKE demodulator given above, we assumed 
that the channel is time-invariant. In a randomly time-variant channel, the position of 
the RAKE fingers and the weights j u;, } must be varied according to the characteristics 
of the channel impulse response. The pilot signal transmitted by the base station to 
the mobile receivers is used to estimate the channel impulse response, from which the 
finger placement and weights {iu, } can be determined adaptively. The interested reader 
is referred to the paper by Bottomley et al. (2000) for a detailed description of the 
performance of the generalized RAKE demodulator for some channel models. 


13.5-4 Receiver Structures for Channels with Intersymbol Interference 

As described above, the wideband signal waveforms that are transmitted through the 
multipath channels resolve the multipath components with a time resolution of 1/ W, 
where W is the signal bandwidth. Usually, such wideband signals are generated as 
direct sequence spread spectrum signals, in which the PN spreading sequences are 
the outputs of linear feedback shift registers, e.g., maximum-length linear feedback 
shift registers. The modulation impressed on the sequences may be binary PSK, QPSK, 
DPSK, or binary orthogonal. The desired bit rate determines the bit interval or symbol 
interval. 

The RAKE demodulator that we described above is the optimum demodulator 
based on the condition that the bit interval 7), T m , i.e., there is negligible 1ST When 
this condition is not satisfied, the RAKE demodulator output is corrupted by 1ST In 
such a case, an equalizer is required to suppress the 1ST 

To be specific, we assume that binary PSK modulation is used and spread by a 
PN sequence. The bandwidth of the transmitted signal is sufficiently broad to resolve 
two or more multipath components. At the receiver, after the signal is demodulated to 
baseband, it may be processed by the RAKE, which is the matched filter to the channel 
response, followed by an equalizer to suppress the 1ST The RAKE output is sampled 
at the bit rate, and these samples are passed to the equalizer. An appropriate equalizer, 
in this case, would be a maximum-likelihood sequence estimator implemented by use 
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FIGURE 13.5-10 

Receiver structure for processing wideband signal corrupted by 1ST 


of the Viterbi algorithm or a decision feedback equalizer (DFE). This demodulator 
structure is shown in Figure 13.5-10. 

Other receiver structures are also possible . If the period of the PN sequence is equal 
to the bit interval, i.e., LT C = 7) ; , where T c is the chip interval and L is the number of 
chips per bit, a fixed filter matched to the spreading sequence may be used to process 
the received signal and followed by an adaptive equalizer, such as a fractionally spaced 
DFE. as shown in Figure 13.5-1 1. In this case, the matched filter output is sampled 
at some multiple of the chip rate, e.g., twice the chip rate, and fed to the fractionally 
spaced DFE. The feedback filter in the DFE would have taps spaced at the bit interval. 
The adaptive DFE would require a training sequence for adjustment of its coefficients 
to the channel multipath structure. 

An even simpler receiver structure is one in which the spread spectrum matched 
filter is replaced by a low-pass filter whose bandwidth is matched to the transmitted 
signal bandwidth. The output of such a filter may be sampled at an integer multiple 
of the chip rate and the samples are passed to an adaptive fractionally spaced DFE. In 
this case, the coefficients of the feedback filter in the DFE. with the aid of a training 
sequence, will adapt to the combination of the spreading sequence and the channel 
multipath. Abdulrahman et al. (1994) consider the use of a DFE to suppress ISI in a 
CDMA system in which each user employs a wideband direct sequence spread spectrum 
signal. 

The paper by Taylor et al. (1998) provides abroad survey of equalization techniques 
and their performance for wireless channels. 



FIGURE 13.5-11 

Alternative receiver structure for processing wideband signal corrupted by ISI. 


884 


■ 13.6 

MULTICARRIER MODULATION (OFDM) 


Digital Communications 


Multicarrier modulation was introduced in Chapter 11 (Section 11.2), and a special 
form of multicarrier transmission, called orthogonal frequency-division multiplexing 
(OFDM), was treated in detail. In this section, we consider the use of OFDM for digital 
transmission on fading multipath channels. 

From our previous discussion, we have observed that OFDM is an attractive al- 
ternative to single-carrier modulation for use in time-dispersive channels. By selecting 
the symbol duration in an OFDM system to be significantly larger than the channel 
dispersion, intersymbol interference (ISI) can be rendered negligible and completely 
eliminated by use of a time guard band or, equivalently, by the use of a cyclic pre- 
fix embedded in the OFDM signal. The elimination of ISI due to multipath dispersion, 
without the use of complex equalizers, is a basic motivation for use of OFDM for digital 
communication in fading multipath channels. However, OFDM is especially vulnera- 
ble to Doppler spread resulting from time variations in the channel impulse response, 
as is the case in mobile communication systems. The Doppler spreading destroys the 
orthogonality of the OFDM subcarriers and results in intercarrier interference (ICI) 
which can severely degrade the performance of the OFDM system. In the following 
section we evaluate the effect of a Doppler spread on the performance of OFDM. 


13.6-1 Performance Degradation of an OFDM System due 
to Doppler Spreading 

Let us consider an OFDM system with N subcarriers \e jl7T ^ k, \, where each subcarrier 
employs either M - ary QAM or PSK modulation. The subcarriers are orthogonal over 
the symbol duration T, i.e., f k = k/T,k = 1, 2, . . . , N, so that 

= l = ‘ (13.6-1) 

The channel is modeled as a frequency-selective randomly varying channel with 
impulse response c(r; t). Within the frequency band of each subcarrier, the channel is 
modeled as a frequency-nonselective Rayleigh fading channel with impulse response. 

c k {x\t) = a k (t)8(t), k = 0, (13.6-2) 

It is assumed that the processes {a k (t), k = 0, 1, . . . , N — 1} are complex-valued, 
jointly stationary, and jointly Gaussian with zero means and cross-covariance function 

R atai (r) = E[a k (t + k, i = 0, 1, . . . , N - 1 (13.6-3) 

For each fixed k, the real and imaginary parts of the process a k (t) are assumed 
independent with identical covariance function. It is further assumed that the covariance 
function R aklXl (x) has the following factorable form 

R akai (x) = Ri(r)R 2 (k - i ) 


(13.6-4) 
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which is sufficient to represent the frequency selectivity and the time- varying effects 
of the channel. R i(r) represents the temporal correlation of the process o'/, (7), which is 
identical for all k = 0, 1, . . . , N — 1, and AS(/:) represents the correlation in frequency 
across subcarriers. 

To obtain numerical results, we assume that the power spectral density correspond- 
ing to R i(r) is modeled as in Jakes (1974) and given by (see Figure 13.1-8) 


S(f ) = | ~ (f/fm) 2 (13.6-5) 

I 0 otherwise 


where F : { is the maximum Doppler frequency. We note that 

*t(T) = M2itf n r) (13.6-6) 

where 7o( r ) is the zero-order Bessel function of the first kind. To specify the correlation 
in frequency across the subcarriers, we model the multipath power intensity profile as 
an exponential of the form 

R c ( r) = Pe~P\ r > 0, /3 > 0 (13.6-7) 


where /3 is a parameter that controls the coherence bandwidth of the channel. The 
Fourier transform of R c ( r) yields 


Rdf) 


P 

P + j2nf 


(13.6-8) 


which provides a measure of the correlation of the fading across the subcarriers, as 
shown in Figure 13.6-1. Hence, Roik) = Rc(k/T ) is the frequency separation between 
two adjacent subcarriers. The 3-dB bandwidth of Rc(f) may be defined as the coherence 
bandwidth of the channel and is easily shown to be \f?>P /2jr. 

The channel model described above is suitable for modeling OFDM signal trans- 
mission in mobile radio systems, such as cellular systems and radio broadcasting sys- 
tems. Since the symbol duration T is usually selected to be much larger than the channel 
multipath spread, it is reasonable to model the signal fading as flat over each subcar- 
rier. However, compared with the entire OFDM system bandwidth W, the coherence 
bandwidth of the channel is usually smaller. Hence, the channel is frequency-selective 
over the entire OFDM signal bandwidth. 

Let us now model the time variations of the channel within an OFDM symbol 
interval T . For mobile radio channels of practical interest, the channel coherence time 
is significantly larger than T . For such slow fading channels, we may use the two-term 
Taylor series expansion, first introduced by Bello (1963), to represent the time- varying 
channel variations a k (t) as 


a k (t) = a k (t 0 ) + a' k (t 0 )(t - t 0 ), 


to = y, 0 < f <T 


(13.6-9) 
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FIGURE 13.6-1 

Multipath delay profile and frequency correlation function. 


Therefore, the impulse response of the At It subchannel within a symbol interval is 
given as 

c k ( r; t ) = a k (t 0 )S(r) + (t - t 0 W k (t 0 )8(r) (13.6-10) 

Since R i(r) given by Equation 13.6-6 is infinitely differentiable, all mean-square 
derivatives exist and hence the differentiation of a k (t) is justified. 

Based on the channel model described above, we determine the ICI term at the 
detector and evaluate its power. The baseband signal transmitted over the channel is 
expressed as 

1 n-i 

s(t) = —= Y^s k e ]2nfkt , 0 < t < T (13.6-11) 

k=0 

where f k = k/T and s k , k = 0, 1, . . . , N — 1, represents the complex- valued signal 
constellation points. We assume that 

E [|j*| 2 ] = 2<? avg (13.6-12) 

where 2£ avg denotes the average symbol energy of each s k . 

The received baseband signal may be expressed as 


r(0 = 


-L 51 e j2nfk ‘ 


+ n{t) 


(13.6-13) 


where n(t) is the additive noise, which is modeled as a complex-valued, zero-mean 
Gaussian process that is spectrally flat within the signal bandwidth with spectral den- 
sity 2N {I W/Hz. By using the two-term Taylor series expansion for a k (t), r(t) may be 
expressed as 


JV-l 


JV— 1 


j iv 1 

C t ) = 55 ak ( ? 0 )s k e l2nfk ' + -J= 55 (f - t 0 W k (to)s k e J 


jlnfkt 


k = 0 


Vt 


k= 0 


+ n{t) (13.6-14) 
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The received signal in a symbol interval is passed through a parallel bank of N 
correlators, where each correlator is tuned to one of the N subcarrier frequencies. The 
output of the /th correlator at the sampling instant is 


Si = -J= j r(t ) e ~ i2nfi ' dt 
o 

, t ^ ^ a 'k^o) s k 

= aj(t 0 )si + — - > — — + «! 

k ~ i 


(13.6-15) 


The first term in Equation 13.6-15 represents the desired signal, the second term rep- 
resents the ICI, and the third term is the additive noise component. 

The mean-square value of the desired signal component is 

S=E [|a, -(total 2 ] 

(13.6-16) 

= E [|a,'(fo)| 2 ] E [M 2 ] = 2£ avg 

where the average channel gain is normalized to unity. The mean-square value of the 
ICI term is evaluated as follows. Since (r) = R\(r) is infinitely differentiable, all 
(mean-square) derivatives of the process ar(f), — oo < t < oo, exist. In particular, the 
first derivative a k (t) is a zero-mean, complex-valued Gaussian process with correlation 
function 


E [a' k (t + r )K(f)*)] = -R'l( r) (13.6-17) 

with corresponding spectral density (2nf) 2 S(f). Hence, 

E [K«| 2 ] = (2nf) 2 S{f)df = 2 n 2 f 2 (13.6-18) 

J-fm 

The power in the ICI term is 



T a' k (to)sk 

2- 


2n J h k ~ i 






= (s) E £ (t-oo-,/ K«o>« «»;('»><)•] 

k=£i l=£i 


(13.6-19) 


k^i 

We note that the pair (a' k (to), c/j(t t i)) is statistically independent of (s k , s/). Further- 
more, the {.Sf - } are iid with zero means. Hence, the first term of the right-hand side of 
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f m T ~ Normalized Doppler Spread 


FIGURE 13.6-2 

Signal-to-ICI power ratio versus normalized Doppler spread. 


Equation 13.6-19 is zero. Therefore, by using the result from Equation 13.6-18 in 
Equation 13.6-19, the power of the ICI component is 


I = 


( Tf,n ) 2 
2 


N - 1 

E 

k = 0 


2£ s 

(k - if 


(13.6-20) 


Consequently, the signal-to-interference ratio SI I is given by 


S 

7 


1 


( Tf„) 2 
2 


N—l 

E 

k=0 

Ml 


1 

{k - if 


(13.6-21) 


Graphs of S/I versus Tf m are shown in Figure 13.6-2 for N = 256 subcarriers and 
i = N /2, the interference on the middle subcarrier. 

The evaluation of the effect of the ICI on the error rate performance of an OFDM 
system requires knowledge of the PDF of the ICI which, in general, is a mixture of 
Gaussian PDFs. However, when the number of subcarriers is large, the distribution of 
the ICI can be approximated by a Gaussian distribution, and thus the evaluation of the 
error rate performance is straightforward. 

Figure 13.6-3 illustrates the symbol error probability for an OFDM system having 
N = 256 subcarriers and 16-QAM, where the error probability is evaluated analytically 
based on the Gaussian model for the ICI and by Monte Carlo simulation. We observe that 
the ICI severely degrades the performance of the OFDM system. In the following section 
we describe a method for suppressing the ICI and, thus, improving the performance of 
the OFDM system. 
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FIGURE 13.6-3 

Symbol error probability for 16-QAM OFDM system with N = 256 subcarriers. 


13.6-2 Suppression of ICI in OFDM Systems 

The distortion caused by ICI in an OFDM system is akin to the distortion caused by 
ISI in a single-carrier system. Recall that a linear time -domain equalizer based on the 
minimum mean-square-error (MMSE) criterion is an effective method for suppressing 
ISI. In a similar manner, we may apply the MMSE criterion to suppress the ICI in the 
frequency domain. Thus, we begin with the N frequency samples at the output of the 
discrete Fourier transform (DFT) processor, which we denote by the vector R(m ) for 
the /nth frame. Then we form the estimate of the symbol s k (m) as 

him) = bfr(m)R(m), k = 0, 1, . . . , IV - 1 (13.6-22) 

where b k irn ) is the coefficient vector of size iVx 1. This vector is selected to minimize 
the MSE 


E [| s k (m) - him ) | 2 ] = E [| s k (m) - b?(rn)R(m ) | 2 ] (13.6-23) 

where the expectation is taken with respect to the signal and noise statistics. By applying 
the orthogonality principle, the optimum coefficient vector is obtained as 

b k {m) = [G(m)G H (m) + cr 2 I N ] ' g k (m ), 


k = 0, l, . . . , N — \ (13.6-24) 
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where 


E [R(m)R H (m)\ = G(m)G H (m) + a 2 1 n 
E [R{m)s' k ' (m)\ = g k (m ) 


(13.6-25) 


and G(m) is related to the channel impulse response matrix H (in) through the DFT 
relation (see Problem 13.16) 

G(m) = W H H(m)W (13.6-26) 


where W is the orthonormal (IDFT) transformation matrix. The vector g k (m) is the /c t h 
column of the matrix G(m), and a 2 is the variance of the additive noise component. 
It is easily shown that the minimum MSE for the signal on the kth subcarrier may be 
expressed as 

E [| s k (m) - s k {m)\ 2 ] = 1 - gf (m)(G(m)G H (m) + o 2 I N )~ l g k (m ) (13.6-27) 

We observe that the optimum weight vectors {b k (m)\ require knowledge of the 
channel impulse response. In practice, the channel response may be estimated by pe- 
riodically transmitting pilot signals on each of the subcarriers and by employing a 
decision-directed method when data are transmitted on the N subcarriers. In a slowly 
fading channel, the coefficient vectors {b k (m)} may also be adjusted recursively by 
employing either an LMS- or an RLS-type algorithm, as previously described in the 
context of equalization for suppression of 1ST 


13.7 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

In this chapter, we have considered a number of topics concerned with digital commu- 
nications over a fading multipath channel. We began with a statistical characterization 
of the channel and then described the ramifications of the channel characteristics on 
the design of digital signals and on their performance. We observed that the reliability 
of the communication system is enhanced by the use of diversity transmission and 
reception. We also considered the transmission of digital information through time- 
dispersive channels and described the RAKE demodulator, which is the matched filter 
for the channel. Finally, we considered the use of OFDM for mobile communications 
and on the performance of an OFDM system, described the effect of ICI caused by 
Doppler frequency spreading. 

The pionerring work on the characterization of fading multipath channels and 
on signal and receiver design for reliable digital communciations over such channels 
was done by Price (1954, 1956). This work was followed by additional significant 
contributions from Price and Green (1958, 1960), Kailath (1960, 1961), and Green 
(1962). Diversity transmission and diversity combining techniques under a variety of 
channel conditions have been considered in the papers by Pierce (1958), Brennan 
(1959), Turin (1961, 1962), Pierce and Stein (1960), Barrow (1963), Bello and Nelin 
(1962a, b, 1963), Price (1962a, b), and Findsey (1964). 
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Our treatment of digital communications over fading channels focused primarily 
on the Rayleigh fading channel model. For the most part, this is due to the wide ac- 
ceptance of this model for describing the fading effects on many radio channels and to 
its mathematical tractability. Although other statistical models, such as the Ricean fad- 
ing model or the Nakagami fading model may be more appropriate for characterizing 
fading on some real channels, the general approach in the design of reliable commu- 
nications presented in this chapter carries over. Alouini and Goldsmith (1998), Simon 
and Alouini (1988, 2000), and Annamalai et al. (1998, 1999) have presented a unified 
approach to evaluating the error rate performance of digital communication systems 
for various fading channel models. The effect of ICI in OFDM for mobile commu- 
nications has been extensively treated in the literature, e.g., the papers by Robertson 
and Kaiser (1999), Li and Kavehrad (1999), Ciavaccini and Vitetta (2000), Li and 
Cimini (2001), Stamoulis et al. (2002), and Wang et al. (2006). A general treatment 
of wireless communications is given in the books by Rappaport (1996) and Stuber 
( 2000 ). 


PROBLEMS 

13.1 The scattering function S( r; k) for a fading multipath channel is nonzero for the range 
of values 0 < r < 1 ms and —0.1 Hz < k < 0. 1 Hz. Assume that the scattering function 
is approximately uniform in the two variables. 

a. Give numerical values for the following parameters: 

(i) The multipath spread of the channel. 

(ii) The Doppler spread of the channel. 

(iii) The coherence time of the channel. 

(iv) The coherence bandwidth of the channel. 

(v) The spread factor of the channel. 

b. Explain the meaning of the following, taking into consideration the answers given 
in (a): 

(i) The channel is frequency-nonselective. 

(ii) The channel is slowly fading. 

(iii) The channel is frequency-selective. 

c. Suppose that we have a frequency allocation (bandwidth) of 10 kHz and we wish to 
transmit at a rate of 100 bits over this channel. Design a binary communication system 
with frequency diversity. In particular, specify 

(i) The type of modulation. 

(ii) The number of subchannels. 

(iii) The frequency separation between adjacent carriers. 

(iv) The signaling interval used in your design. 

Justify your choice of parameters. 

13.2 Consider a binary communication system for transmitting a binary sequence over a fading 
channel. The modulation is orthogonal FSK with third-order frequency diversity (L = 3). 
The demodulator consists of matched filters followed by square-law detectors. Assume 
that the FSK carriers fade independently and identically according to a Rayleigh envelope 
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distribution. The additive noises on the diversity signals are zero-mean Gaussian with 
autocorrelation functions E [zl(t)zk(t + r )] = 2Nq8(t). The noise processes are mutually 
statistically independent. 

a. The transmitted signal may be viewed as binary FSK with square-law detection, 
generated by a repetition code of the form 

1 -* ci = [1 1 1], 0^c 0 = [0 0 0] 

Determine the error rate performance P for a hard-decision decoder following the 
square-law-detected signals. 

b. Evaluate Pbh for y c = 100 and 1000. 

c. Evaluate the error rate P/, s for y c = 1 00 and 1 000 if the decoder employs soft-decision 
decoding. 

d. Consider the generalization of the result in (a). If a repetition code of block length 
L ( L odd) is used, determine the error probability Pbh of the hard-decision decoder 
and compare that with , the error rate of the soft-decision decoder. Assume y 1 . 

13.3 Suppose that the binary signal ±5/ (t ) is transmitted over a fading channel and the received 
signal is 


n(t) = ±as/(t) + z(t), 0 < t < T 

where z(f) is zero-mean white Gaussian noise with autocorrelation function 

R zz (r) = 2N 0 8(r) 

The energy in the transmitted signal is E = ± L |s/(r)|“ dt. The channel gain a is specified 
by the probability density function 

p(a) = 0.18(a) + 0.98(a - 2) 

a. Determine the average probability of error Pi, for the demodulator that employs a filter 
matched to J/(f). 

b. What value does P/, approach as E /No approaches infinity? 

c. Suppose that the same signal is transmitted on two statistically independently fading 
channels with gains a\ and 02 , where 

p(ak) = 0. l<5(dyt) + 0.95(a J t — 2), k = 1,2 

The noises on the two channels are statistically independent and identically distributed. 
The demodulator employs a matched filter for each channel and simply adds the two 
filter outputs to form the decision variable. Determine the average P/,. 

d. For the case in (c) what value does Ph approach as E /No approaches infinity? 

13.4 A multipath fading channel has a multipath spread of T„, = 1 s and a Doppler spread 
Bd = 0.01 FIz. The total channel bandwidth at bandpass available for signal transmission 
is W = 5 Hz. To reduce the effects of intersymbol interference, the signal designer selects 
a pulse duration T = 10 s. 

a. Determine the coherence bandwidth and the coherence time. 

b. Is the channel frequency selective? Explain. 

c. Is the channel fading slowly or rapidly? Explain. 

d. Suppose that the channel is used to transmit binary data via (antipodal) coherently 
detected PSK in a frequency diversity mode. Explain how you would use the available 
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channel bandwidth to obtain frequency diversity and determine how much diversity 
is available. 

e. For the case in (d), what is the approximate SNR required per diversity to achieve an 
error probability of 10 -6 ? 

/. Suppose that a wideband signal is used for transmission and a RAKE-type receiver is 
used for demodulation. How many taps would you use in the RAKE receiver? 

g. Explain whether or not the RAKE receiver can be implemented as a coherent receiver 
with maximal ratio combining. 

h. If binary orthogonal signals are used for the wideband signal with square-law post- 
detection combining in the RAKE receiver, what is the approximate SNR required to 
achieve an error probability of 10 -6 ? (Assume that all taps have the same SNR.) 


13.5 In the binary communication system shown in Figure PI 3.5, zi (f ) and z 2 {t) are statistically 
independent white Gaussian noise processes with zero-mean and identical autocorrelation 
functions R zz ( r) = 2?Vo<5(t). The sampled values U\ and U 2 represent the real parts of 
the matched filter outputs. For example, if s/(t) is transmitted, then we have 

lh=2£ + Ai 
U 2 = Ni + N 2 


where £ is the transmitted signal energy and 

r rf 


N k = Re 


s*(t)zk(t)dt , 


k = 1,2 


.Jo 

It is apparent that U\ and U 2 are correlated Gaussian variables while N\ and N 2 are 
independent Gaussian variables. Thus, 


Pin i) = 

1 

exp ( 

' »? 

\fljtO 

v 2o 2 

P(n 2 ) = 

1 

exp ( 

' n\ 

\/2ita 

v 2cr 2 


where the variance of Nk is a 1 = 2£Nq. 

a. Show that the joint probability density function for U\ and U 2 is 


p(ui,u 2 ) 


liter 2 


exp 


[(u 2 - 1£) 2 - u 2 (it\ - 2£) + | 



FIGURE P13.5 
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if s(t) is transmitted and 

p(u i , w 2 ) = — T exp < \(ui + 2£) 2 - u 2 (u\ + 28) + \u\ 1 

Z7ra 2 [ cr 2 L z J 

if — j(f) is transmitted, 
fo. Based on the likelihood ratio, show that the optimum combination of U i and U2 results 
in the decision variable 

U = U1 + PU2 

where /J is a constant. What is the optimum value of /S? 

c. Suppose that s(t ) is transmitted. What is the probability density function of (/? 

d. What is the probability of error assuming that s{t) was transmitted? Express your 
answer as a function for the SNR £/Nq. 

e. What is the loss in performance if only U = U 1 is the decision variable? 


13.6 Consider the model for a binary communication system with diversity as shown in Fig- 
ure P13. 6. The channels have fixed attenuations and phase shifts. The are complex- 

valued white Gaussian noise processes with zero-mean and autocorrelation functions 

R zz (t) = E [z* k (t)z k (t + r)] = 2N 0k S(r) 

(Note that the spectral densities {No k } are all different.) Also, the noise processes {^(t)} 
are mutually statistically independent. The {fi k } are complex-valued weighting factors to 
be determined. The decision variable from the combiner is 


U = Re 


£>t/* 


a. Determine the PDF p(u) when +1 is transmitted. 

b. Determine the probability of error P k as a function of the weights {/3 k }. 

c. Determine the values of {ji k } that minimize P k . 


*i(0 /3, 





FIGURE P13.6 


13.7 Determine the probability of error for binary orthogonal signaling with Lth-order diversity 
over a Rayleigh fading channel. The PDFs of the two decision variables are given by 
Equations 13.4-31 and 13.4-32. 
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13.8 A binary sequence is transmitted via binary antipodal signaling over a Rayleigh fading 
channel with Lth-order diversity. When s/(t) is transmitted, the received equivalent low- 
pass signals are 

n{ 0 = a k e 1 ’ >t Si(t) + Zk(t), k= 1,2, L 

The fading among the L subchannels is statistically independent. The additive noise 
terms {^(Ol are zero-mean, statistically independent, and identically distributed white 
Gaussian noise processes with autocorrelation function R zz { r) = 2Nq8(t). Each of the 
L signals is passed through a filter matched to s/(t) and the output is phase-corrected to 
yield 


U k = Re 




r k (t)s*(t)dt 


k=l,2,...,L 


The {Uk} are combined by a linear combiner to form the decision variable 


u = Y J U k 


a. Determine the PDF of U conditional on fixed values for the { a k } . 

b. Determine the expression for the probability of error when the {a k } are statistically 
independent and identically distributed Rayleigh random variables. 


13.9 The Chernov bound for the probability of error for binary FSK with diversity L in Rayleigh 
fading was shown to be 


P 2 (L) < [4/z(l - p)] L = 

< 2 -ybg(¥c) 


1 1 + K % 

(2 + y c ) 2 


where 


g(Yc) 


— l°g 2 
Y c 


(2 + Vc) 2 
4(1 + Yc) 


a. Plot g(y c ) and determine its approximate maximum value and the value of y c where 
the maximum occurs. 

b. For a given y b , determine the optimal order of diversity. 

c. Compare P 2 {L), under the condition that g(y c ) is maximized (optimal diversity), with 
the error probability for binary FSK and AWGN with no fading, which is 

P 2 = 

and determine the penalty in SNR due to fading and noncoherent (square-law) com- 
bining. 


13.10 A DS spread spectrum system is used to resolve the multipath signal components in a 
two-path radio signal propagation scenario. If the path length of the secondary path is 
300 m longer than that of the direct path, determine the minimum chip rate necessary to 
resolve the multipath components. 


896 


Digital Communications 


13.11 A baseband digital communication system employs the signals shown in Figure P13. 1 1(a) 
for the transmission of two equiprobable messages. It is assumed that the communication 
problem studied here is a “one-shot” communication problem; that is, the above messages 
are transmitted just once and no transmission takes place afterward. The channel has no 
attenuation (a = 1), and the noise is AWGN with power spectral density ^ /Vo- 

a. Find an appropriate orthonormal basis for the representation of the signals. 

b. In a block diagram, give the precise specifications of the optimum receiver using 
matched filters. Label the diagram carefully. 

c. Find the error probability of the optimum receiver. 

d. Show that the optimum receiver can be implemented by using just one filter (see the 
block diagram in Figure P13.1 1(b)). What are the characteristics of the matched filter, 
the sampler and decision device? 

e. Now assume that the channel is not ideal but has an impulse response of c(f) = 
5(f) + ^5(f — \T). Using the same matched filter as in (d), design the optimum 
receiver. 

/. Assuming that the channel impulse response is c(f) = 5(f) + aS(t — ^ T), where a is 
a random variable uniformly distributed on [0, 1], and using the same matched filter 
as in (d), design the optimum receiver. 


■Sl(f) 


s 2 (t) 


A 


A 


o 


t 


o \t T t 


(a) 


AWGN 



(b) 


FIGURE P13.ll 


13.12 A communication system employs dual antenna diversity and binary orthogonal FSK 
modulation. The received signals at the two antennas are 

r(t) = otis(f) + «i(f) 
r 2 (t) = a 2 s(t) + n 2 (t) 

where oi\ and a 2 are statistically iid Rayleigh random variables, and «i(f) and n 2 {t) are 
statistically independent, zero-mean and white Gaussian random processes with power- 
spectral density \Nq. The two signals are demodulated, squared, and then combined 
(summed) prior to detection. 

a. Sketch the functional block diagram of the entire receiver, including the demodulator, 
the combiner, and the detector. 

b. Plot the probability of error for the detector and compare the result with the case of 
no diversity. 
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13.13 The two equivalent lowpass signals shown in Figure P13.13 are used to transmit a binary 
sequence. The equivalent low-pass impulse response of the channel is h(t) = 45(f) — 
2 S(t — T). To avoid pulse overlap between successive transmissions, the transmission rate 
in bits/s is selected to be R = 1/277 The transmitted signals are equally probable and 
are corrupted by additive zero-mean white Gaussian noise having an equivalent lowpass 
representation z(t) with an autocorrelation function 

R zz (r) = E[z*(t)z(t + r)] = 2N 0 S(t) 

a. Sketch the two possible equivalent lowpass noise-free received waveforms. 

b. Specify the optimum receiver and sketch the equivalent lowpass impulse responses of 
all filters used in the optimum receiver. Assume coherent detection of the signals. 


A 









0 

T t 0 

-A 


1 j 

4 1 

T 't 


FIGURE P13.13 

13.14 Verify the relation in Equation 13.3-14 by making the change of variable y = a 2 £b/No 
in the Nakagami-m distribution. 

13.15 Consider a digital communication system that uses two transmitting antennas and one 
receiving antenna. The two transmitting antennas are sufficiently separated so as to pro- 
vide dual spatial diversity in the transmission of the signal. The transmission scheme is 
as follows: If si and s 2 represent a pair of symbols from either a one-dimensional or a 
two-dimensional signal constellation, which are to be transmitted by the two antennas, 
the signal from the first antenna over two signal intervals is (jj, s|) and from the second 
antenna the transmitted signal is (s 2 , —5*). The signal received by the single receiving 
antenna over the two signal intervals is 

r i = h\s l + h 2 s 2 + «i 
r 2 = hiS% - h 2 s* + n 2 

where (hi, h 2 ) represent the complex- valued channel path gains, which may be assumed 
to be zero-mean, complex Gaussian with unit variance and statistically independent. The 
channel path gains (hi, I 12 ) are assumed to be constant over the two signal intervals and 
known to the receiver. The terms (n 1 , n 2 ) represent additive white Gaussian noise terms 
that have zero-mean and variance a 2 and uncorrelated. 

a. Show how to recover the transmitted symbols (si , s 2 ) from (n , r 2 ) and achieve dual 
diversity reception. 

b. If the energy in the pair (si, $ 2 ) is ( £ s , £ s ) and the modulation is binary PSK, determine 
the probability of error. 

c. Repeat (b) if the modulation is QPSK. 

13.16 In the suppression of ICI in on DFDM system, the received signal vector for the mth 
frame may be expressed as 


r(m) = H(m)W s(m) + n(m) 
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where W is the N x N IDFT transformation matrix, s(m) is the N x 1 signal vector, n(m) 
is the zero-mean, Gaussian noise vector with iid components, and H(m) is the N x /V 
channel impulse response matrix, defined as 

H(m)= [h H (0, m)h H (\,m)---h H (N - 1 ,m)] H 

where h{n, m) is the right cyclic shift by n + 1 positions of the zero-padded channel 
impulse response vector of dimension N x 1. 

By expressing the DFT of r(m) by R(m), derive the relations in Equations 13.6-24, 
13.6-25, and 13.6-27, where G(m ) is defined in Equation 13.6-26. 

13.17 Prove the result given in Equation 13.6-17. 

13.18 Prove the result given in Equation 13.6-18. 



Fading Channels II: Capacity and Coding 


This chapter studies capacity and coding aspects for fading channels. In Chapter 13 
the physical sources of the fading phenomenon in communications were discussed, and 
different models for fading channels were introduced. In particular, we saw that the 
effect of fading can be expressed in terms of the multipath spread of the channel denoted 
by T m and the Doppler spread of the channel denoted by B,/. Equivalently we can use 
the coherence bandwidth and the coherence time of the channel denoted by (A f) c and 
(A t) c , respectively. If two narrow pulses are separated by less than the coherence time 
of the channel, they will experience the same fading effects; and if two frequency tones 
are separated by less than the coherence bandwidth, they will be affected by the same 
fading effects. If the signal bandwidth is much larger than the coherence bandwidth of 
the channel, i.e., if W (A f) c , then we have a frequency-selective channel model; and 

if W (A f) c , then the channel model is frequency-nonselective or flat in frequency. 
In this case all frequency components of the input signal experience the same fading 
effects. Similarly if the signal duration is much longer than the channel coherence time, 
i.e., T (A t) c , the signal will be subject to different fading effects and we have a fast 

fading channel; and if T <5C (A t) c we have a slowly fading channel, or the channel is 
flat in time. Since the bandwidth and the duration of a signal are related through the 
approximate relation W ~ 1/7', we conclude that if in a channel T m Bj 1 , i.e., if the 
channel is underspread, then we can choose a signal bandwidth W such that for this 
signal the channel is flat in both time and frequency.^ 

In dealing with capacity and coding for fading channels, we need to study chan- 
nel variations during transmission of a block of signal waveforms transmitted over 
the channel. We can distinguish two different possibilities. In one case the character- 
istics of the channel change fast enough with respect to the transmission duration of 
a block that a single block of information experiences all possible realizations of the 
channel frequently. In this case the time averages during the transmission duration of 
a single block are equal to the statistical (ensemble) averages over all possible channel 


tWe are excluding the spread spectrum systems in which W 1 /T c where T c is the chip interval. 
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realizations. Another possibility is that the block duration is short and each block ex- 
periences only a cross section of channel characteristics. In this model, the channel 
remains relatively constant during the transmission of one block, and we can say that 
each block experiences a single state of the channel and the following blocks experi- 
ence different channel states. The notions of channel capacity in these two cases are 
quite different. In the first channel model, since all channel realizations are experienced 
during a block, an ergodic channel model is appropriate and ergodic capacity can be 
defined as the ensemble average of channel capacity over all possible channel realiza- 
tions. In the second channel model, where in each block different channel realizations 
are experienced, for each block the capacity will be different. Thus, the capacity can 
best be modeled as a random variable. In this case another notion of capacity known 
as outage capacity is more appropriate. 

Another parameter that affects the capacity of fading channels is whether infor- 
mation about the state of the channel is available at the transmitter and/or the receiver. 
Availability of state information at the receiver that is usually measured by transmitting 
tones over the channel at different frequencies helps the receiver in increasing the chan- 
nel capacity since the state of the channel can be interpreted as an auxiliary channel 
output. Availability of the state information at the transmitter makes it possible for the 
transmitter to design its signal to match the state of the channel through some kind of 
precoding. In this case the transmitter can change the level of the transmitted power 
according to the channel state, thus preserving transmission of valuable power during 
the time the channel is in deep fade and saving it for transmission during periods when 
the channel does not highly attenuate the transmitted signal. 

Coding for fading channels introduces new challenges and opportunities that are 
different from the standard additive white Gaussian noise channels. As we will see in 
this chapter, the metrics that determine the performance of coding schemes over fading 
channels are different from the standard metrics used to compare the performance of 
different coding schemes over additive white Gaussian noise channels. On the other 
hand, since coding techniques introduce redundancy through transmission of the parity 
check codes, the extra transmissions provide diversity that improves the performance 
of coded systems over fading channels. 

In this chapter we study the case of single-antenna systems from an information- 
theoretic and coding point of view. The study of capacity and coding for multiple- 
antenna systems and the design and analysis of space-time codes are done in Chapter 15. 


■ 14.1 

CAPACITY OF FADING CHANNELS 

The capacity of a channel is defined as the supremum of the rates at which reliable com- 
munication over the channel is possible. Reliable communication at rate R is possible 
if there exists a sequence of codes with rate R for which the average error probability 
tends to zero as the block length of the code increases. In other words, at any rate less 
than capacity we can find a code whose error probability is less than any specified e > 0. 
In Chapter 6 we gave a general expression for the capacity of a discrete memoryless 
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channel in the form 


C = max I(X; Y) (14.1-1) 

P(x) 

where the maximum is taken over all channel input probability density functions. For 
a power-constrained discrete-time AWGN channel, the capacity can be expressed as 

C = hog(l + f) (14.1-2) 

where P is the signal power, N is the noise power, and C is the capacity in bits per 
transmission, or bits per (real) dimension. For a complex- input complex-output channel 
with circular complex Gaussian noise with noise variance No, or Nq/2 per real and 
imaginary components, the capacity is given by 

c = log ( l + i0 ,l4| - 3) 

bits per complex dimension. 

The capacity of an ideal band-limited, power-limited additive white Gaussian wave- 
form channel is given by 

c=w H' + m) (14 - 1 - 4) 

where W denotes the bandwidth, P denotes the signal power, and Nq/2 is the noise 
power spectral density. The capacity C in this case is given in bits per second. For an 
infinite-bandwidth channel in which the signal-to-noise ratio P /(NoW) tends to zero, 
the capacity is given in Equation 6.5-44 as 


C = 


1 P 

ln2 No 


P 

1.44 — 

No 


(14.1-5) 


The capacity in bits/sec/FIz (or bits per complex dimension) which determines the 
highest achievable spectral bit rate is given by 


C = log (1 + SNR) (14.1-6) 

where SNR denotes the signal-to-noise ratio defined as 

P 

SNR = (14.1-7) 

NoW 

Note that since W ~ y, where T s is the symbol duration, the above expression for 
SNR can be written as SNR = = Tl where £ s indicates energy per symbol. In an 

AWGN channel the capacity is achieved by using a Gaussian input probability density 
function. At low values of SNR we have 

1 

C« SNR 1.44 SNR (14.1-8) 

In 2 


tWe use the notation CM (0, a 2 ) to denote a circular complex random variable with variance <t 2 /2 per real 
and imaginary parts. 
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The notion of capacity for a band-limited additive white Gaussian noise channel 
can be extended to a nonideal channel in which the channel frequency response is 
denoted by C(/). In this case the channel is described by the input-output relation of 
the form 


y(t) = x(t) + c(t) + n(t) 


(14.1-9) 


where c(t) denotes the channel impulse response and C(/) = cW[c(t)\ is the channel 
frequency response. The noise is Gaussian with a power spectral density of S n (f). It 
was shown in Chapter 1 1 that the capacity of this channel is given by 




P(f)\C(f)\ 2 


s„(f ) 

where P(f), the the input power spectral density, is selected such that 

P(f) = ( K - Sn ^\ 

7 \ i /'■'* / _r\ 1 2 


where x + is defined by 


and K is selected such that 


I c{f)V 


x + = max{0, x] 


P(f)df=P 


df 

(14.1-10) 

such that 

(14.1-11) 


(14.1-12) 


(14.1-13) 


The water- filling interpretation of this result states that the input power should be 
allocated to different frequencies in such a way that more power is transmitted at those 
frequencies of which the channel exhibits a higher signal-to-noise ratio and less power 
is sent at the frequencies with poor signal-to-noise ratio. A graphical interpretation of 
the water-filling process is shown in Figure 14.1-1. 

The water-filling argument can be also applied to communication over parallel 
channels. If N parallel discrete-time AWGN channels have noise powers Nj, 1 <i < N, 
and an overall power constraint of P, then the total capacity of the parallel channels is 
given by 


C = 


1 N 



where P, ’s are selected such that 


Pi =(K~ Ni) + 

subject to 

N 

J2 P ‘ = P 

/=i 


(14.1-14) 

(14.1-15) 

(14.1-16) 


In addition to frequency selectivity which can be treated through water-filling argu- 
ments, a fading channel is characterized with time variations in channel characteristics, 
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FIGURE 14.1-1 

The water-filling interpretation of the channel 
capacity. 


i.e., time selectivity. Since the capacity is defined in the limiting sense as the block 
length of the code tends to infinity, we can always argue that even in a slowly fading 
channel the block length can be selected large enough that in any block the channel 
experiences all possible states, and hence the time averages over one block are equal to 
the statistical averages. However, from a practical point of view, this would introduce 
a large delay which is not acceptable in many applications, for instance, speech com- 
munication on cellular phones. Therefore, for a delay-constrained system on a slowly 
fading channel, the ergodicity assumption is not valid. 

A common practice to break the inherent memory in fading channels is to em- 
ploy long interleavers that spread a code sequence across a long period of time, thus 
making individual symbols experience independent fading. However, employing long 
interleavers would also introduce unacceptable delay in many applications. These ob- 
servations make it clear that the notion of capacity is more subtle in the study of fading 
channels, and depending on the coherence time of the channel and the maximum delay 
acceptable in the application under study, different channel models and different no- 
tions of channel capacity need to be considered. Since fading channels can be modeled 
as channels whose state changes, we first study the capacity of these channels. 


14.1-1 Capacity of Finite-State Channels 

A finite-state channel is a channel model for a communication environment that varies 
with time. We assume that in each transmission interval the state of the channel is 
selected independently from a set of possible states according to some probability 
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FIGURE 14.1-2 

A finite-state channel. 

distribution on the space of channel states. The model for a finite-state channel is 
shown in Figure 14.1-2. 

In this channel model, in each transmission the output y e Q/ depends on the input 
x e IF and the state of the channel s e dF through the conditional PDF p(y \x, s). The 
sets W, dK and S’ denote the input, the output, and the state alphabets, respectively, 
and are assumed to be discrete sets. The state of the channel is generated independent 
of the channel input according to 

n 

p(s) = n^) (14.1-17) 

1 = 1 

and the channel is memoryless, i.e, 

n 

p(y\x,s) = (14.1-18) 

;= t 

The encoder and the decoder have access to noisy versions of the state denoted by 
u e B/ and r respectively. Based on an original idea of Shannon (1958), Salehi 
(1992), and Caire and Shamai (1999) have shown that the capacity of this channel can 
be given as 

C = maxI(T-Y\V) (14.1-19) 

pit) 

In this expression the maximization is over p(t), the set of all probability mass functions 
on rfF where dF denotes the set of all vectors of length | H/ | with components from 
BY. The cardinality of the set dFis \BY^ ^ \ and the set dFis called the set of input 
strategies. 

In the study of fading channels, certain cases of this channel model are of partic- 
ular interest. The special case where U = S and V is a degenerate random variable 
corresponds to the case when complete channel state information (CSI) is available at 
the receiver and no channel state information is available at the transmitter. In this case 
the capacity reduces to 

C = max/(X; Y\S) (14.1-20) 

Pix) 

where 

Pis, x, y) = p(s)p(x)p(y\x, s) (14.1-21) 

Note that since 

I(X; FIS) = J2p ( s ) /(*; y\S = s) (14.1-22) 
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the capacity can be interpreted as the maximum over all input distributions of the 
average of the mutual information over all channel states. A second interesting case 
occurs when the state information is available at both the transmitter and the receiver. 
In this case 


C = max I{X ; Y\S) = V p(s) max I(X; Y|S = s) (14.1-23) 

p(*k) p{x\s) 

S 

where the maximization is on all joint probabilities of the form 

p(s, x, y ) = p(s)p(x\s)p(y\x, 5 ) (14.1-24) 

Clearly since in this case the state information is available at the transmitter, the encoder 
can choose the input distribution based on the knowledge of the state. Since for each state 
of the channel the input distribution is selected to maximize the mutual information 
in that state, the channel capacity is the expected value of the capacities. A third 
interesting case occurs when complete channel information is available at the receiver 
but the receiver transmits only a deterministic function of it to the transmitter. In this 
case v = s and u = g(s), where g(-) denotes a deterministic function. In this case the 
capacity is given by [see Caire and Shamai (1999)] 

C = V p(u) max I(X; Y|S, U = u) (14.1-25) 

p(x\u) 

u 

This case corresponds to when the receiver can estimate the channel state but due to 
communication constraints over the feedback channel can transmit only a quantized 
version of the state information to the transmitter. 

The underlying memoryless assumption in these cases makes these models appro- 
priate for a fully interleaved fading channel. 


■ 14.2 

ERGODIC AND OUTAGE CAPACITY 

To study the difference between ergodic and outage capacity, consider the two-state 

channel shown in Figure 14.2-1. In this figure two binary symmetric channels, one with 

crossover probability p = 0 and one with crossover probability p = 1 /2, are shown. 

We consider two different channel models based on this figure. 

1. In channel model 1 the input and output switches choose the top channel (BSC 1) 
with probability <5 and the bottom channel (BSC 2) with probability 1 — S, inde- 
pendently for each transmission. In this channel model each symbol is transmitted 
independently of the previous symbols, and the state of the channel is also selected 
independently for each symbol. 

2. In channel model 2 the top and the bottom channels are selected at the beginning of 
the transmission with probabilities 8 and 1—5, respectively; but once a channel is 
selected, it will not change for the entire transmission period. 
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BSC 1 
p = o 



FIGURE 14.2-1 

A two-state channel. 


BSC 2 

p = 1/2 


From Chapter 6 we know that the capacities of the top and bottom channels are C\ = 1 
and C 2 = 0 bits per transmission, respectively. To find the capacity of the first channel 
model, we note that since in this case for transmission of each symbol the channel 
is selected independently over a long block, the channel will experience both BSC 
component channels according to their corresponding probabilities. In this case time 
and ensemble averages can be interchanged, the notion of ergodic capacity, denoted 
by C, applies, and the results of the preceding section can be used. The capacity of 
this channel model depends on the availability of the state information. We distinguish 
three cases for the first channel model. 

1 . Case 1 : No channel state information is available at the transmitter or receiver. In 
this case it is easy to verify that the average channel is a binary symmetric channel 
with crossover probability of 1 ^ , and hence the ergodic capacity is 


2. Case 2: Channel state information available at the receiver. Using Equation 14.1- 
22, we observe that in this case we maximize the mutual information with a fixed 
input distribution. But since regardless of the state of the channel a uniform input 
distribution maximizes the mutual information, the ergodic capacity of the channel 
is the average of the two capacities, i.e., 


3. Case 3: Channel state information is available at the transmitter and the receiver. 
Here we use Equation 14.1-23 to find the channel capacity. In this case we can 
maximize the mutual information individually for each state, and the capacity is the 
average of the capacities as given in Equation 14.2-2. 

A plot of the two capacities as a function of 5 is given in Figure 14.2-2. Note that 
in this particular channel since the capacity achieving input distribution for the two 
channels states is the same, the results of cases 2 and 3 are the same. In general the 
capacities in these cases are different, as shown in Problem 14.7. 

In the second channel model where one of the two channels BSC 1 or BSC 2 is 
selected only once and then used for the entire communication situation, the capacity 
in the Shannon sense is zero. In fact it is not possible to communicate reliably over this 
channel model at any positive rate. The reason is that if we transmit at a rate R > 0 and 
channel BSC 2 is selected, the error probability cannot be set arbitrarily small. Since 
channel BSC 2 is selected with a probability of 1 — S > 0, reliable communication at 
any rate R > 0 is impossible. In fact in this case the channel capacity is a binary random 
variable which takes values of 1 and 0 with probabilities <5 and 1 — 5, respectively. This 



(14.2-1) 


c = aci + (1 - s)c 2 = s 


(14.2-2) 
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FIGURE 14.2-2 

The ergodic capacity of channel model 1 . 

is a case for which ergodic capacity is not applicable and a new notion of capacity 
called outage capacity is more appropriate (Ozarow et al. (1994)). 

We note that since the channel capacity in this case is a random variable, if we 
transmit at a rate R > 0, there is a certain probability that the rate exceeds the capacity 
and the channel will be in outage. The probability of this event is called the outage 
probability and is given by 


where F c (c) denotes the CDF of the random variable C and F C (R ~ ) is the limit-from- 
left of F c (c) at point c = R. 

For any 0 < e < 1 we can define C 6 , the e -outage capacity of the channel, as the 
highest transmission rate that keeps the outage probability under e, i.e., 


Pom (R) = P [C < R] = F C (R-) 


(14.2-3) 


C e = max [R : P 0Ut (R) < e} 

In the channel model 2, the e-outage capacity of the channel is given by 


(14.2-4) 



for 0 < e < 1 — 8 
for 1 — 8 < e < 1 


(14.2-5) 


14.2-1 The Ergodic Capacity of the Rayleigh Fading Channel 


In this section we study the ergodic capacity of the Rayleigh fading channel. The 
underlying assumption is that the channel coherence time and the delay restrictions of 
the channel are such that perfect interleaving is possible and the discrete-time equivalent 
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of the channel can be modeled as a memoryless AWGN channel with independent 
Rayleigh channel coefficients. The lowpass discrete-time equivalent of this channel is 
described by an input-output relation of the form 

y, = RiXi + iij (14.2-6) 


where x, and y, are the complex input and output of the channel, R, is a complex iid 
random variable with Rayleigh distributed magnitude and uniform phase, and nfs are 
iid random variables drawn according to CM (0, No). The PDF of the magnitude of R, 
is given by 


p(r) = 



r > 0 
r < 0 


(14.2-7) 


We know from Chapter 2, Equations 2.3—45 and 2.3-27, that R 2 is an exponential 
random variable with expected value £[R 2 ] = 2cr 2 . Therefore, if p = \ R , ■ | 2 , then from 
Equation 2.3-27 we have 

„(„) ?>° (14.2-8) 

(0 p < 0 

and since the received power is proportional to p, we have 

P r = 2 a 2 P, (14.2-9) 

where P, and P, denote the transmitted and the received power, respectively. In the 
following discussion we assume that 2er 2 = 1, thus P, = P, = P. The extension of 
the results to the general case is straightforward. 

Depending on the availability of channel state information at the transmitter and 
receiver, we study the ergodic channel capacity in three cases. 


No Channel State Information In this case the receiver knows neither the magni- 
tude nor the phase of the fading coefficients R , ; hence no information can be transmitted 
on the phase of the input signal. The input-output relation for the channel is given by 

y = Rx + n (14.2-10) 

where R and n are independent circular complex Gaussian random variables drawn 
according to CM (0, 2cr 2 ) and CM (0, No), respectively. 

To determine the capacity of the channel in this case, we need to derive an expression 
for p(y\x) which can be written as 

2 p2jz poo 

P(y\x) = — / p(y\x, r, 0)p{r)dr d6 (14.2-11) 

x.7t Jo Jo 

where p(r) is given by Equation 14.2-7 and 

1 \y-ref>A 2 

e "o 

n No 


p(y\x, r, 9) = 


(14.2-12) 
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It can be shown (see Problem 14.8) that Equation 14.2-1 1 simplilies to 


p(y\x) 


1 

n (N 0 + \x\ 2 ) 




(14.2-13) 


This relation clearly shows that all the phase information is lost. 

It has been shown by Abou-Faycal et al. (200 1 ) that when an input power constraint 
is imposed, the capacity achieving input distribution for this case has a discrete iid 
amplitude and an irrelevant phase. However, there exists no closed-form expression 
for the capacity in this case. Moreover, in the same work it has been shown that for 
relatively low average signal-to-noise ratios, when P/No is less than 8 dB, only two 
signal levels, one of them at zero, are sufficient to achieve capacity; i.e., in this case 
on-off signaling is optimal. As the signal-to-noise ratio decreases, the amplitude of the 
nonzero input in the optimal on-off signaling increases, and in the limit for P/No — »■ 0 
we obtain 


IP P 

C = ss 1.44 — 

In 2 N 0 No 


(14.2-14) 


By comparing this result with Equation 14.1-8 it is seen that for low signal-to-noise 
ratios the capacity is equal to the capacity of an AWGN channel; but at high signal-to- 
noise ratios the capacity is much lower than the capacity of an AWGN channel. 

Although no closed form for the capacity exists, a parametric expression for the 
capacity is derived in Taricco and Elia (1997). The parametric form of the capacity is 
given by 


P = fMe-r-^ - 1 

fi-y - fiV(fi)- 1 
c = — b iog 2 r(/x) 


(14.2-15) 


where T(z) is the di gamma function defined by 


*(z) = 


Hz) 

r(z) 


(14.2-16) 


and y = — 'l'(l) ~ 0.5772156 is Euler’s constant. 

A plot of capacity in this case is shown in Figure 14.2-3. The capacity of AWGN 
is also given for reference. It is clearly seen that lack of information about the channel 
state is particularly harmful at high signal-to-noise ratios. 


State Information at the Receiver Since in this case the phase of the fading process 
is available at the receiver, the receiver can compensate for this phase ; hence without loss 
of generality we can assume that fading is modeled by a multiplicative real coefficient 
R with Rayleigh distribution whose effect on the power is a multiplicative coefficient p 
with exponential PDF. Using Equation 14. 1-22, we have to find the expected value of 
the mutual information over all possible states. This corresponds to finding the expected 
value of 


c = io g(i + P T 


(14.2-17) 
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FIGURE 14.2-3 

The ergodic capacity of a Rayleigh fading channel with no CSI. 


in which p has an exponential PDF given by Equation 14.2-8. Since log is a concave 
function, we can use Jensen’s inequality (see Problem 6.29) to show that 

- f / PM 


C = E 


log 1 + p 


No 

P 


<log^l+E[p]— j (14.2-18) 

= 108 (' + iQ 


This shows that in this case the capacity is upper-bounded by the capacity of an AWGN 
channel whose signal-to noise-ratio is equal to the average signal-to-noise ratio of the 
Rayleigh fading channel. 

To find an expression for the capacity in this case, we note that 


C = 




1 

hM 


» 0 
e p 


r 



e p dp 


1 _J_ 

= e snr T 

In 2 



(14.2-19) 
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where T(a, z) denotes the complementary gamma function, defined by 

poo 

T (a,z)= / t a ~ l e~' dt 
Jz 

Note that T(a, 0) = Ffn). 

At low SNR values we can use the approximation 

( P\ IP 
\ ^Nj In 2 Av 

and therefore at low signal-to-noise ratios the capacity is given by 
_ p r°° 

C« / pe~ p dp « 1.44 SNR (14.2-22) 

No In 2 Jo 

which is equal to the capacity of an AWGN channel at low signal-to-noise ratios. At 
high signal-to-noise ratios we have 

and the capacity becomes 

log { f, B e " dp 

= log SNR + - 1 - [ (In p)e~ p dp 
In 2 Jo 

= log SNR- 0.8327 

Note that the capacity of an AWGN channel at high signal-to-noise ratios is approxi- 
mated by log(SNR); therefore at high signal-to-noise ratios, the ergodic capacity of a 
Rayleigh fading channel with channel state information at the receiver lags the capacity 
of the AWGN channel by 0.83 bit per complex dimension. 

Plots of the capacities of this channel model and the capacity of an AWGN chan- 
nel with comparable SNR are given in Figure 14.2-4. Unlike the case where no CSI 
is available, in this case the asymptotic difference between the two curves at high 
signal-to-noise ratios is roughly 2.5 dB. This compares very favorably with the per- 
formance difference of different signaling schemes over Rayleigh fading and AWGN 
channels. We recall from Equation 13.3-13 that the error probability of common signal- 
ing schemes over Rayleigh fading channels decreases inversely with the signal-to-noise 
ratio, whereas on Gaussian channels the error probability is an exponentially decreasing 
function of the signal-to-noise ratio. For instance, to achieve an error probability of 1 0 5 
using BPSK, an AWGN channel requires a % of 9.6 dB and a Rayleigh fading channel 
requires 44 dB. This is a huge performance difference. The much lower performance 
difference between capacities is highly promising and indicates that coding can provide 
considerable gain in fading channels. The required length of the codewords on fading 
channels is largely dependent on the dynamics of the fading process and the coherence 
time of the channel, whereas in an AWGN channel the AWGN effects are averaged 
over a codeword. In a fading channel, in addition to noise effects, fading effects have 


(14.2-23) 


(14.2-24) 


(14.2-20) 


(14.2-21) 
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FIGURE 14.2-4 

Capacity of Gaussian and Rayleigh fading channel with CSI at the decoder. 

to be averaged out over the codeword length. If the channel coherence time is large, 
this could require very large codeword lengths and could entail unacceptable delay. 
Interleaving is often used to reduce large codeword requirements, but it cannot reduce 
the delay in fading channels. Another alternative would be to spread the transmitted 
code components in the frequency domain to benefit from the diversity. This approach 
is studied in Section 14.7. 

State Information Available at Both Sides If the state information is available at 
both the transmitter and the receiver, then the result of Equation 14.1-23 can be used. 
In this case the transmitter can adjust its power level to the fading level similar to the 
water-filling approach in the frequency domain. Water-filling in time can be employed 
to allocate the optimal transmitted power as a function of channel state information. 
Here p, the channel state, plays the same role as frequency in the standard water-filling 
argument, and the capacity is given by 


where P(p) denotes the optimum power allocation as a function of the fading parameter 
p. The optimal power allocation is obtained by using water-filling in time, i.e., 



( 14 . 2 - 25 ) 



( 14 . 2 - 26 ) 


where as before ( x) + = maxfr. 0}, and p (l is selected such that 



( 14 . 2 - 27 ) 
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Note that from above 


P(P) = 



Hence, Equation 14.2-27 becomes 



This equation can be simplified as 


P > Po 
P < Po 

P 

No 


e ~Po 


Po 


r (0, po) = 


p 

No 


(14.2-28) 


(14.2-29) 


(14.2-30) 


where T(a, z) is given by Equation 14.2-20. Substituting Pip) in the expression for 
capacity results in 


C= flog(l + p(± 

J Po \ V Po 

f°° _ n p 

= e p log — dp 

J Po Po 


1 

ln2 


r (0, po) 



(14.2-31) 


Equations 14.2-30 and 14.2-31 provide a parametric description of the capacity of this 
channel model. 

It is interesting to compare the capacity of this channel with an AWGN chan- 
nel at low and high frequencies. For a very low signal-to-noise ratio, we consider 
the case where SNR = 0.1 corresponding to —10 dB. Substituting this value into 
Equation 14.2-30 results in po = 1.166. Substituting this value into Equation 14.2-31 
yields C = 0.241. Computing the capacity of an AWGN channel at SNR = — 10 
dB yields C = 0.137. Interestingly, the capacity of the fading channel at low signal- 
to-noise ratios in this case exceeds the capacity of a comparable AWGN channel. At 
high signal-to-noise ratios, however, the capacity is less than the capacity of an AWGN 
channel and is very close to the capacity of a Rayleigh fading channel for which the 
state information is available only at the receiver. 

A plot of capacity of this channel versus the signal-to-noise ratio is given in Fig- 
ure 14.2-5. The capacity of an AWGN channel is also provided for comparison. 

Figure 14.2-6 compares the capacities of Rayleigh fading channels under different 
availability of state information scenarios with the capacity of the Gaussian channel. 


14.2-2 The Outage Capacity of Rayleigh Fading Channels 

The outage capacity is considered when due to strict delay restrictions ideal inter- 
leaving is impossible and the channel capacity cannot be expressed as the average 
of the capacities for all possible channel realizations, as was done in the case of the 
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FIGURE 14.2-5 

Capacity of Gaussian and Rayleigh fading channel with CSI at both sides. 

ergodic capacity. In this case the capacity is a random variable (Ozarow et al. (1994)). 
We assume at rates less than capacity ideal coding is employed to make transmission 
effectively error-free. With this assumption, errors occur only when the rate exceeds 
capacity, i.e., when the channel is in outage. 



FIGURE 14.2-6 

Capacity of Gaussian and Rayleigh fading channel with different CSI. 
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For a Rayleigh fading channel the outage e-capacity is derived by using Equa- 
tions 14.2-3 and 14.2^1 as 

C € = max{R : P out (R) < e} 

= max{R : F C (R ~ ) = e} (14.2-32) 

= Fc\e) 

where Fc(-) is the CDF of the random variable representing the channel capacity. 

For a Rayleigh fading channel with normalized channel gain, we have 


C = log (1 + p SNR) (14.2-33) 

where p is an exponential random variable with expected value equal to 1 . The outage 
probability in this case is given by 


which simplifies to 


Pou, (R) = P \c < R] 


Pout (R) = p 


P < 


2 r - 1 
SNR 


2^-1 

= 1 — e SNE 


(14.2-34) 


(14.2-35) 


Note that for high signal-to-noise ratios, i.e., for low outage probabilities, this expression 
can be approximated by 


P out (R) & 

SNR 

(14.2-36) 

Solving for R from Equation 14.2-36 results in 


R = log [1- SNR In (1 — P out )] 

(14.2-37) 

from which 


C € = log [1 — SNR ln(l-e)] 

(14.2-38) 


We consider the cases of low and high signal-to-noise ratios separately. For low 
SNR values we have 


C e 


SNR 
h V2 


In 


1 -e 


(14.2-39) 


Since the capacity of an AWGN at low SNR values is ^ SNR, we conclude that the 
outage capacity is a fraction of the capacity of an AWGN channel. In fact the capacity 
of an AWGN channel is scaled by a factor of In For instance, for e = 0.1 this 
value is equal to 0.105, and the outage capacity of the Rayleigh fading channel is only 
one-tenth of the capacity of an AWGN channel with the same power. For very small e, 
this factor tends to e and we have 


Cf ~ € C'awCA' 


(14.2-40) 
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For high signal-to-noise ratios, the capacity is approximated by 


C € « log 


1 

SNR In 

1-6 


= log SNR + log 



(14.2-41) 


The capacity of an AWGN channel at high SNR is log SNR; therefore the outage 
capacity of the Rayleigh fading channel is less than the capacity of a comparable 
AWGN channel by log ( in ^ r j bits per complex dimension. For e = 0. 1 this is equal 
to 3.25 bits per complex dimension. For very small e we have In pp ~ e, and the 
difference between the capacities is log 2 e. 

The outage capacity of a Rayleigh fading channel for e = 0.1 and e = 0.01 and 
the capacity of the AWGN channel are shown in Figure 14.2-7. 


Effect of Diversity on Outage Capacity 

If a communication system over a Rayleigh fading channel employs L-order diversity, 
then the random variable p = |R| 2 has a / 2 PDF with 2 L degrees of freedom. In the 
special case of L = 1 we have a / 2 random variable with two degrees of freedom 
which is an exponential random variable studied so far. For L-order diversity we use 



FIGURE 14.2-7 

The outage capacity of a Rayleigh fading channel for e = 0.1 and e = 0.01. The capacity of an 
AWGN channel is given for comparison. 
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the CDF of a x 2 random variable given by Equation 2.3-24. We obtain 


Pout (*) = P 


p < 


= 1 — e 



(14.2-42) 


Equating P nul (R) to e and solving for R give the e-outage capacity C, for a channel 
with L-order diversity. The resulting C f is obtained by solving the equation 


2 C *-1 
£ SNR 


L— 1 


E 


i 

jfc! 


/ 2 C « - 1 

^ SNR 



or equivalently 


2 C g-l 
(} SNR 


oo 


E 


1 

ifc! 


/ 2 Cf - 1 

y SNR 


k 


= e 


(14.2-43) 


(14.2-44) 


No closed-form solution for C ( exists for arbitrary L . Plots of Co.oi f° r different diversity 
orders as well as the capacity of an AWGN channel are given in Figure 14.2-8. The 
noticeable improvement due to diversity is clear from this figure. 



FIGURE 14.2-8 

The outage capacity of fading channels with different diversity orders. 
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■ 14.3 

CODING FOR FADING CHANNELS 
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In Chapter 13 we have demonstrated that diversity techniques are very effective in 
overcoming the detrimental effects of fading caused by the time-variant dispersive 
characteristics of the channel. Time and/or frequency diversity techniques may be 
viewed as a form of repetition (block) coding of the information sequence. From this 
point of view, the combining techniques described in Chapter 13 represent soft decision 
decoding of the repetition code. Since a repetition code is a trivial form of coding, 
we now consider the additional benefits derived from more efficient types of codes. In 
particular, we demonstrate that coding provides an efficient means of obtaining diversity 
on a fading channel. The amount of diversity provided by a code is directly related to 
its minimum distance. 

As explained in Section 13.4, time diversity is obtained by transmitting the signal 
components carrying the same information in multiple time intervals mutually separated 
by an amount equal to or exceeding the coherence time ( At) c of the channel. Similarly, 
frequency diversity is obtained by transmitting the signal components carrying the same 
information in multiple frequency slots mutually separated by an amount at least equal 
to the coherence bandwidth (A/) c of the channel. Thus, the signal components carrying 
the same information undergo statistically independent fading. 

To extend these notions to a coded information sequence, we simply require that the 
signal waveform corresponding to a particular code bit or code symbol fade indepen- 
dently of the signal waveform corresponding to any other code bit or code symbol. This 
requirement may result in inefficient utilization of the available time-frequency space, 
with the existence of large unused portions in this two-dimensional signaling space. 
To reduce the inefficiency, a number of codewords may be interleaved in time or in 
frequency or both, in such a manner that the waveforms corresponding to the bits or sym- 
bols of a given codeword fade independently. Thus, we assume that the time-frequency 
signaling space is partitioned into nonoverlapping time-frequency cells. A signal wave- 
form corresponding to a code bit or code symbol is transmitted within such a cell. 

In addition to the assumption of statistically independent fading of the signal com- 
ponents of a given codeword, we assume that the additive noise components corrupting 
the received signals are white Gaussian processes that are statistically independent and 
identically distributed among the cells in the time-frequency space. Also, we assume 
that there is sufficient separation between adjacent cells that intercell interference is 
negligible. 

An important issue is the modulation technique that is used to transmit the coded 
information sequence. If the channel fades slowly enough to allow the establishment 
of a phase reference, then PSK or DPSK may be employed. In the case where channel 
state information (CSI) is available at the receiver, knowledge of the phase makes co- 
herent detection possible. If this is not possible, then FSK modulation with noncoherent 
detection at the receiver is appropriate. 

A model of the digital communication system for which the error rate performance 
will be evaluated is shown in Figure 14.3-1 . The encoder may be binary, nonbinary, or 
a concatenation of a nonbinary encoder with a binary encoder. Furthermore, the code 
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FIGURE 14.3-1 

Model of communications system with modulation/demodulation and encoding/decoding. 

generated by the encoder may be a block code a convolutional code, or, in the case of 
concatenation, a mixture of a block code and a convolutional code. 

To explain the modulation, demodulation, and decoding, consider a linear binary 
block code in which k information bits are encoded into a block of n bits. For simplicity 
and without loss of generality, let us assume that all n bits of a codeword are transmitted 
simultaneously over the channel on multiple frequency/time cells. A codeword c, having 
bits {dj} is mapped into signal waveforms and interleaved in time and/or frequency and 
transmitted. The dimensionality of the signal space depends on the modulation system. 
For instance, if FSK modulation is employed, each transmitted symbol is a point in 
the two-dimensional space, hence the dimensionality of the encoded/modulated signal 
is 2 n. Since each codeword conveys k bits of information, the bandwidth expansion 
factor for FSK is B e = 2 n/k. 

The demodulator demodulates the signal components transmitted in independently 
faded frequency/time cells, providing the sufficient statistics to the decoder which 
appropriately combines them for each codeword to form the M = 2 k decision variables. 
The codeword corresponding to the maximum of the decision variables is selected. If 
hard decision decoding is employed, the optimum maximum-likelihood decoder selects 
the codeword having the smallest Hamming distance relative to the received codeword. 

Although the discussion above assumed the use of a block code, a convolutional 
encoder can be easily accommodated in the block diagram shown in Figure 14.3-1 . For 
this case the maximum-likelihood soft decision decoding criterion for the convolutional 
code can be efficiently implemented by means of the Viterbi algorithm. On the other 
hand, if hard decision decoding is employed, the Viterbi algorithm is implemented with 
Hamming distance as the metric. 


■ 14.4 

PERFORMANCE OF CODED SYSTEMS IN FADING CHANNELS 

In studying the capacity of fading channels in Section 14.2 we noted that the notion of 
capacity in fading channels is more involved that the notion of capacity for a standard 
memoryless channel. The capacity of a fading channel depends on the dynamics of the 
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fading process and how the coherence time of the channels compares with the code 
length as well as the availability of channel state information at the transmitter and 
the receiver. In this section we study the performance of a coded system on a fading 
channel, and we observe that the same factors affect the code performance. 

We assume that a coding scheme followed by modulation, or a coded modulation 
scheme, is employed for data transmission over the fading channel. Our treatment 
at this point is quite general and includes block and convolutional codes as well as 
concatenated coding schemes followed by a general signaling (modulation) scheme. 
This treatment also includes block or trellis-coded modulation schemes. 

We assume that M signal space coded sequences {jc | , jc 2 , . . . , Xm) are employed 
to transmit one of the equiprobable messages 1 < in < M. Each codeword x, is a 
sequence of 11 symbols of the form 


where each Xjj is a point in the signal constellation. We assume that the signal constel- 
lation is two-dimensional, hence x i; ’s are complex numbers. 

Depending on the dynamics of fading and availability of channel state information, 
we can study the effect of fading and derive bounds on the performance of the coding 
scheme just described. 


14.4-1 Coding for Fully Interleaved Channel Model 

In this model we assume a very long interleaver is employed and the codeword com- 
ponents are spread over a long interval, much longer than the channel coherence time. 
As a result, we can assume that the components of the transmitted codeword undergo 
independent fading. The channel output for this model, when jc, is sent, is given by 


where the R s represents the fading effect of the channel and the nj is the noise. In this 
model due to the interleaving, Rf s are independent and nf s are iid samples drawn 
according to CM (0, No). The vector input-output relation for this channel is given by 



(14.4-1) 


y, = Rj.Xij - n r 1 <j<n 


(14.4-2) 


y = Rx + n 

where R is an n x n diagonal matrix 


(14.4-3) 


R x 0 0 

0 R 2 0 


0 

0 

0 


R = diag(fl lf R 2 , ...,R n ) = 0 0 


(14.4-4) 


0 


0 0 0 


and n is a vector with independent nf s as its components. The Rf s are in general 
complex, denoting the magnitude and the phase of the fading process. 
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The maximum-likelihood decoder, having received y, uses the rule 

m = argmax p(y \x m ) (14.4-5) 

1 <m<M 

to detect the transmitted message m. By the independence of fading and noise compo- 
nents we have 

n 

P(y\Xm) = n p(yj\x,nj) (14.4-6) 

1=1 

The value of p(yj\x m j) depends on the availability of channel state information at the 
receiver. 

CSI Available at the Receiver In this case the output of the channel consists of the 
output vector y and the channel state sequence (r\ , ty, . . . , r n ) which are realizations of 
random variables R\, R 2 , , R„, or equivalently the realization of matrix/?. Therefore, 
the maximum-likelihood rule, P[observed|input], becomes 

n n 

n l ,( yj > r A x mj ) = Ilp(rjMyj\xmj, rj) (14.4-7) 

1=1 i = 1 

Substituting Equation 14.4-7 into 14.4-5 and dropping the common positive factor 

n;u p( r j ) resuit in 

n 

m = arg max p{yj \x mj , rj ) (14.4-8) 

1 <m<M j_ j 

No CSI Available at the Receiver In this case the ML rule is 

n 

rh = arg max piy , \x mj ) (14.4-9) 

1 <m<M j_Y 

where 

P(yj\x m j)= / p( r i)p(yj\x m j , r j) drj (14.4-10) 


Performance of Fully Interleaved Fading Channels with CSI at the Receivers 

A bound on error probability can be obtained by using an approach similar to the one 
used in Section 6.8-1. Using Equation 6.8-2, we have 

M 

Pe\ m — ^ ^ P [T ^ l^mrn \X m Sent] 

m'= 1 
m'^m 

M 

— p 

— / 1 m—>m' 

m'= 1 
m'^m 


(14.4-11) 
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where P m -> m ' is the pairwise error probability (PEP), i.e., the probability of error in 
a binary communication system consisting of two signals x m and x m < when x m is 
transmitted. Here we derive an upper bound on the pairwise error probability by using 
the Chernov bounding technique. For other methods of studying the pairwise error 
probability, the reader is referred to Biglieri et al. (1995, 1996, 1998a). 

A Bound on the Pairwise Error Probability To compute a bound on the PEP, we 
note that since in this case CSI is available at the receiver, according to Equation 14.4—8, 
the channel conditional probabilities are p(yj\x m j, rj ) and hence 


Since we are assuming x m is transmitted, we have _y ; = >~jX mj + nj. Substituting this 
into Equation 14.4-15 and simplifying yield 


where Nj is a real zero-mean Gaussian random variable with variance 2|r ; \ 2 cl~ lm , j No 
and d mm 'j is the Euclidean distance between the constellation points representing the 
yth components of x m and x m >. 

Substituting Equation 14.4-16 into Equation 14.4-13 yields 



(14.4-12) 


where 


= P [Z mm '(r ) > 0] 

and the likelihood ratio Z mm '(r) becomes 



(14.4-13) 



(14.4-14) 



with 


Zmm'j (G ) — I yj rjX mj I | y j FjX m 'j\ 


mm j 


(14.4-15) 




(14.4-16) 



(14.4-17) 
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Using this result, Equation 14.4-13 gives 


P[*,„ x m ' | R = r] =P 


n 


J2{\Rj\ 2 d 2 mm ij+Nj) <0 
.1=! 

R = r 


(14.4-18) 


Applying the Chernov bounding technique discussed in Section 2.4 gives 


n 

E (\ R j\ 2d lm'j + N J) < 0 

R = r 

= E 

VE]=i(i R i? d L'j+ N j)<° 

R = r 

.1=1 






< 


min 

v <0 


IT* 


A^ 2d L'i +N i) 


1=1 


R j = rj 
(14.4-19) 


where \Rj\ denotes the envelope of the fading process. Substituting this result into 
Equation 14.4-12 gives 


m—>m — 


< min 1 

v<0 - LJ - 
1=1 ' 


? v {\ R j? d L>i+ N j) 


R j = n 


P( r j) dr j 


(14.4-20) 


Ricean Fading Here we assume that \Rj\, the envelope of the fading process, 
has a Ricean PDF as given by Equation 2.3-56. We can directly apply the result of 
Example 2.4-2 in Section 2.4, and in particular Equation 2.4—25, to obtain 


< 


n 


d 1 . 

mm'j ~-2 

2V 0 


exp 


d 2 , . , 

min J 

4V 0 _ 

d 2 , . ~ 

I 4 - """ 1 p -2 
^ 2 N 0 ■ 


and finally, from Equation 14.4-1 1 we have 


1 M M n y 


d 2 , 


Pe - M ^ ^ n 


exp 


n — 1 m' — 1 j— 1 1 ~\~ 
m'^m 


mm 'j (~r 2 

2N 0 ° 


d 2 , . 9 

mm J Z 

4N 0 * 


d 2 , . 

mm' j 

2 No 


1 + -^a 1 


(14.4-21) 


(14.4-22) 


In Equations 14.4-21 and 14.4-22, a 2 and 5 are the parameters of the Ricean random 
variable determining the envelope of the fading process. The pairwise error probability 
can also be expressed in terms of the Rice factor K as (see Equation 2.4-26) 


n 


K + 1 


AKd 2 

mm' j 


l=i K + 1 + 


Ad 


exp 


4 Vo 


4V 0 

K+ 1 + 


Ad 1 


4 V 0 J 


(14.4-23) 
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where A = E [| Rj\ 2 ] = s 2 + 2cr 2 is the fading gain and K = ^ 2 is the Rice factor. 
From Equations 14.4-21 and 14.4—23 it is seen that if for one particular codeword 
component j we have x mj = x m 'j, and hence d mm 'j = 0, the corresponding term in 
the product is equal to 1 . Therefore, it is sufficient to consider only those terms in the 
product for which x mj ^ x m > r Let us denote the components j for which x mj ^ x m >j 
hy i.e, 


mm’ — {1 — J — tt . Xmj 7 ^ X m > j j 


(14.4-24) 


Then 


p , < 
1 m—>m' — 


1 + T^CT 2 


d 2 

mm' j 

2N 0 


exp 


d L’ 

~4N~ 0 


2 S 2 


d 2 , . 

1 + mm > g 2 
1 ^ 2V 0 ° ■ 


(14.4-25) 


and in terms of the Rice factor, 


n 


K+ 1 


Ad 2 , . 

K + 1 4 4 ^ 


exp 


AKd 1 , 

mm / 

4Wo 


Ad 2 , . 

K + 1 + — ^ 


4N 0 J 


(14.4-26) 


For a normalized fading channel which does not change the transmitted energy, we 
have E[|/?| 2 ] = A = 1, and the pairwise error probability can be bounded by 


< 


n 


K + 1 


j,J mm ,K+ l + % 


exp 


Kd 2 , 

mm j 

4V 0 ' 


d 2 , . 

K + 1 + mmj 

^ m 1 -r 4A , 0 J 


(14.4-27) 


Rayleigh Fading and Gaussian Channels For the special case of a Rayleigh fading 
channel, i.e., in the extreme case of s = K = 0, we have 


< 


n 


i 


JZJmrn' 1 + -^T-O 2 


d L'J 

2N 0 


(14.4-28) 


and for a normalized Rayleigh fading channel for which 2 ct 2 = 1 in which the received 
power is equal to the transmitted power (see Equation 14.2-9) we obtain 

P m ^m'< n ^7 (14.4-29) 

j^mm' 1 4 4 ( y 0 

The other extreme of a Ricean channel occurs when K — > oo. In this case 
the Ricean channel becomes a Gaussian channel. For this case Equation 14.4-27 
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reduces to 


n 

mm' 


mm / 

T»r 


or 


< e tN o 


(14.4-30) 


(14.4-31) 


This is the standard result for a Gaussian channel used in Equation 4.2-72. 


Ad 2 

mm' j 


High Signal-to-Noise Ratio Approximation At high signal-to-noise ratios when 
4No )£> K + 1, the bound in Equation 14.4-26 can be approximated as 

(K + l)e- K 


P _ , < 
1 m->m ~ 


n 

i^Jmm 


A 2 d 2 , . 

mm' j 

4V 0 


(14.4-32) 


We define the Hamming distance between x m and x m > as the cardinality of the set 
J mm '\ i.e., the number of components at which jc and x,„- are different. 

d H (.% m i % m r ') — mm' — |{1 1 / ' C . Xmj " 7 “ Xm'j } ( 14.4 33 ) 

The product distance of a code is defined as 

S 2 (x m ,x m .)= * ^ n tim'j (14.4-34) 

\ ts ) j£Jmm' 


where £ s is the average energy per codeword, given by 


S s 


1 

M 


M 
m= 1 


(14.4-35) 


Note that with tins definition we have factored the effect of the signal energy and have 
defined the product distance for a normalized code, which is similar to the original 
code, but has average energy equal to 1. With this definition Equation 14.4-32 can be 
written as 


or 


where 


p ,< 
1 m—>m £3 


[(\+K)e 

/ \ dH(x m ,X m i) 

VTVoJ 


-K~\ d H( X m,X m i) 


m i % m') 


, < 


(1 + K)e~ 

r ,-G. 

1 mm 4 V 0 


-i d H (x m ,x m ,) 


(14.4-36) 


(14.4-37) 


— {P {pt m i tC m')') 


dH(xm,x m r) 


(14.4-38) 
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is the geometric mean of the Euclidean distances of the unequal components of x m and 
x m '. Note that the signal-to-noise ratio is multiplied by V mm >, which we call the coding 
gain of sequences x m and x m - due to its similarity to the Gaussian case. 

Using Equation 14.4-37, in Equation 14.4-22, we obtain the following approximate 
bound: 


1 

< — 


M M 

M “ 4^ 

m=L m= 1 
m'^m 


(1 + K)e~ 
r , ^ 


d H (x,„,X m r ) 


4N 0 J 


(14.4-39) 


For reasonably high signal-to-noise ratios, the dominating term in Equation 14.4-39 is 
the term corresponding to the codewords with the minimum Hamming distance. In this 
case we have 


P e <(M 


1) 


(1 + K)e~ K 

p . _IL 

1 mln 4W 0 


(14.4-40) 


where d mm is the minimum Hamming distance of the code and 


r min — 



(14.4-41) 


where (5^ lin denotes the minimum of the product distances of the codeword pairs having 
the minimum Hamming distance. 

For a Rayleigh fading channel K = 0 and for high signal-to-noise ratios, Equa- 
tions 14.4-36, 14.4-37, 14.4—39, and 14.4^10 simplify to 


, < 


< 


1 

( £\d^x m ,x m ,) 

\4N 0 ) 

d H (x,„,x m i) 

1 


fi2(x m , JCmO 


S s 

nm' 4Wo 
M M 


P e < — v y 
~ M ^ ^ 


m= 1 m'= 1 


~ I dfi (x m i % m* ) 


r 

1 mm 4 N 0 _ 


P e <(M- 1) 


r -G_ 

1 mm 4No 


(14.4-42) 

(14.4-43) 

(14.4-44) 

(14.4-45) 


Note that in Equations 14.4-40 and 14.4—45 we have been rather conservative to 
use the factor ( M — 1). This is with the assumption that all codewords are at minimum 
distance from the transmitted codeword and certainly results in an upper bound on the 
error probability. A more realistic bound would be obtained if (M — 1) were substituted 
by the (average) number of codewords at distance d m [ a , i.e., the multiplicity of the code 
denoted by Ai m ; n . 
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Diversity Through Coding Since the product distance is defined for a unit-energy 
constellation, its effect is independent of the signal-to-noise ratio. Its effect on the per- 
formance of the coded system is to increase the signal-to-noise ratio, or shift the perfor- 
mance plots by r m j n , the coding gain. A very important role is played by the minimum 
Hamming distance of the code. Comparing Equations 14.4—42 to 14.4-45 with the per- 
formance of diversity systems derived in Chapter 13, we note that in coded systems the 
error probability is proportional to (SNR) -rfmin and in a system with L-order diversity 
the performance is propositional to (SNR) ~ L . We conclude that the effect of coding 
is similar to the effect of an L-order diversity with L = d mm . In other words, a code 
with minimum distance of d mm provides diversity of order d mm . This should be clear 
by noting that a diversity system is equivalent to transmitting a signal L times, and this 
is similar to using a repetition code of length L for which d m i n = L. Coding, however, 
can provide greater flexibility in choice of the diversity order and can provide coding 
gain as well. In the context of coding for fading channels, the parameter d mm of a code 
is usually called the diversity order or the effective length of the code. 

From the above discussion it is clear that the factors affecting the performance of a 
coded system on a Rayleigh fading channel are quite different from the factors affecting 
the performance on Gaussian channels. On a Gaussian channel the performance of a 
coded system is mainly determined by the minimum Euclidean distance of the code. In 
other words, as long as the Euclidean distance between two codewords is large, it does 
not matter how this distance is distributed among the code components. In a Rayleigh 
fading channel, two parameters of the code contribute to its performance. The minimum 
distance of the code determines the diversity order of the coded system and therefore 
determines the slope of the error probability plots of the coded system. This is the most 
important factor determining the code performance particularly at high signal-to-noise 
ratios. A second factor that affects the performance is the product distance of the code 
whose impact on the performance of the coded system is felt through the coding gain 
I 'm m - This effect is an additive effect on the performance plots and results in a horizontal 
shift in performance curves. Since T m ; n is the geometric mean of the Euclidean distances 
of the codeword components over nonequal components, and the geometric mean of 
positive numbers with a constant sum is maximized when the numbers are equal, we 
conclude that a good performing code over a Rayleigh fading channel must have all 
the components different to provide the highest diversity and must have the overall 
Euclidean distance equally distributed among the codeword components to achieve the 
highest possible coding gain. 

Signal Space Diversity To describe the effect of diversity order of a coded system 
in a Rayleigh fading channel and see the difference in performance between Rayleigh 
fading and Gaussian channels, consider the two signal sets given in Figure 14.4—1. 
The signal constellation (a) is a standard QPSK constellation, and (b) is a rotated 
version of it. If coding affects only the quadrature component of the transmitted signal, 
the constellation gets contracted in the vertical direction. Under these conditions the 
constellation points move to the location denoted by the empty circles. If the fading 
is quite deep, it is possible that the two constellation points with the same real part 
collapse into the same point, thus causing considerable error probability. It is clear that 
under these conditions the constellation shown in Figure 1 4.4-1 (b) performs better than 
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(b) 


FIGURE 14.4-1 

The effect of Hamming distance on the performance of a coded system over fading channels. 
[From Boutros and Viterbo (1998), copyright IEEE.] 

the constellation of Figure 14.4-l(a). Note that the two constellations have the same 
Euclidean distance between signal points, and hence their performance over Gaussian 
channels is similar. The reason for better performance of constellation (b) is that it has 
higher Hamming distance and hence provides higher diversity. The diversity order for 
constellation (a) is 1, whereas the diversity order for constellation (b) is 2. This type of 
diversity which is a direct result of the choice of the points in the signal space is called 
signal space diversity. Note that in moving from constellation (a) to constellation (b) no 
redundancy is introduced, and therefore the spectral efficiency of the communication 
system has not been compromised. The better performance of signal space diversity 
is achieved by a simple rotation of the constellation. It has been shown by Boutros 
and Viterbo (1998) that this simple rotation can improve the performance of a QPSK 
signaling scheme over a Rayleigh fading channel by 8 dB at error probability of 1 0 3 . 

Signal space diversity through rotation of a Gaussian constellation can be applied 
to signal constellations carved from a lattice. Using this technique results in a system 
with improved performance on fading channels at no bandwidth or power cost. The 
only drawback of these systems is increased detection complexity when compared with 
the unrotated lattice. Details on signal space diversity can be found in Boutros et al. 
(1996) and Boutros and Viterbo (1998). 

Performance of Fully Interleaved Fading Channels with No CSI 

Derivation of the pairwise error probability in this case is more involved. The details 
for an MPSK constellation can be found in Divsalar and Simon (1988a) and Jamali and 
Le-Ngoc (1994). The result for Ricean fading is given by 



(14.4-46) 


where 



V~K cos (9) 


(14.4-47) 
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At high signal-to-noise ratios and moderate to low values of K, this expression can 
be further simplified and can be written in the following form 


2e sr-' 

du 2^ 


(K + l)e~ 


jtJm, 


m % n 


r mm ’ SNR 


(14.4-48) 


where du = du(x m , x m >) is the Hamming distance between x m and x m > and x m = 


—j=x m and x m t = —j=x m ’. The signal-to-noise ratio is defined as SNR = jf. For the 
special case of a Rayleigh fading channel for which K = 0, this bound becomes 


2e 

du 2^ j 




X m ~ X„ 


y mm ' SNR 


(14.4-49) 


■ 14.5 

TRELLIS-CODED MODULATION FOR FADING CHANNELS 

Our discussion in Section 14.4 shows that in design of good codes for fading channels it 
is important to consider code parameters that are different from the parameters consid- 
ered for code design on Gaussian channels. We recall that for code design on Gaussian 
channels, when soft decision decoding is employed, two parameters determine the 
performance of the code. These parameters are 

1. The minimum Euclidean distance of the code. This is the dominating factor that 
determines the performance of the code, particularly at high signal-to-noise ratios. 

2. The multiplicity of the code, i.e., the number of codewords that are at low Euclidean 
distance, and particularly at minimum Euclidean distance, from a given codeword. 
This parameter is particularly important at low signal-to-noise ratios. Turbo codes 
are examples of codes with low multiplicity that contributes to their excellent per- 
formance at low SNRs. 

For fading channels the code parameters with highest impact on code performance are 

1. The code diversity order or effective length, given by the minimum Hamming dis- 
tance of the code. This determines the slope of the error probability plot and is 
particularly the determining factor at high signal-to-noise ratios. 

2. The product distance of the code as defined by Equation 14.4-34 which determines 
the coding gain defined by Equations 14.4-38 and 14.4-41 . This parameter results in 
a shift in the error probability plot of the code and has the same effect at all signal-to- 
noise ratios. It is interesting to note that the effect of increasing the product distance 
on the coding gain is more pronounced at lower diversity orders. This is due to the 
effect of the j— exponent in Equation 14.4—41 . For instance, doubling the product 
distance in a code with diversity order of 2 increases the coding gain by 1.5 dB, 
whereas in a code with diversity order of 4, the same increase in the product distance 
improves the coding gain by 0.75 dB. 
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3. The multiplicity of the code N m \„, i.e., the total number of codewords at minimum 
diversity order and product distance. This factor affects the performance of the code 
at low signal-to-noise ratios. 


14.5-1 TCM Systems for Fading Channels 

Trellis-coded modulation was described in Section 8.12 as a means for achieving a 
coding gain on bandwidth-constrained channels, where we wish to transmit at a bit 
rate-to-bandwidth ratio R/W > 1 . For such channels, the digital communication sys- 
tem is designed to use bandwidth-efficient multilevel or multiphase modulation (PAM, 
PSK, DPSK, or QAM), which allows us to achieve an R/W > 1. When coding is 
applied in signal design for a bandwidth-constrained channel, a coding gain is desired 
without expanding the signal bandwidth. This goal can be achieved, as described in 
Section 8. 12, by increasing the number of signal points in the constellation over the cor- 
responding uncoded system, to compensate for the redundancy introduced by the code, 
and designing the trellis code so that the Euclidean distance in a sequence of transmitted 
symbols corresponding to paths that merge at any node in the trellis is larger than the 
Euclidean distance per symbol in an uncoded system. In contrast, traditional coding 
schemes used on fading channels in conjunction with FSK or PSK modulation expand 
the bandwidth of the modulated signal for the purpose of achieving signal diversity. 

In designing trellis-coded signal waveforms for fading channels, we may use the 
same basic principles that we have learned and applied in the design of conventional 
coding schemes. In particular, the most important objective in any coded signal design 
for fading channels is to achieve as large a diversity order as possible. 

As indicated above, the candidate modulation methods that achieve high bandwidth 
efficiency are M -ary PSK, DPSK, QAM, and PAM. The choice depends to a large extent 
on the channel characteristics. If there are rapid amplitude variations in the received 
signal, QAM and PAM may be particularly vulnerable, because a wideband automatic 
gain control (AGC) must be used to compensate for the channel variations. In such a 
case, PSK or DPSK is more suitable, since the information is conveyed by the signal 
phase and not by the signal amplitude. DPSK provides the additional benefit that carrier 
phase coherence is required only over two successive symbols. However, there is an 
SNR degradation in DPSK relative to PSK. 

The discussion and the design criteria provided in Section 14.5 show that a good 
TCM code for the Gaussian channel is not necessarily a good code for the fading 
channel. It is quite possible that a trellis code has a large Euclidean distance but has 
a low effective code length or product distance. In particular some of the good codes 
designed by Ungerboeck for the Gaussian channel (Ungerboeck (1983)) have parallel 
branches in their trellises. The existence of parallel branches in TCM codes is due to 
the existence of uncoded bits, as explained in Chapter 8. Obviously, two paths in the 
trellis that are similar on all branches but correspond to different branches on a parallel 
branch have a minimum distance of 1 and provide a diversity order of unity. Such codes 
are not desirable for transmission over fading channels due to their low diversity order 
and should be avoided. This is not, however, a problem with the Gaussian channel, and 
in fact many good TCM schemes that work satisfactorily on Gaussian channels have 
parallel branches in their trellis representation. 
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To design TCM schemes with high diversity order, we have to make sure that the 
paths in the trellis corresponding to different code sequences have long runs of different 
branches, and the branches are labeled by different symbols from the code constellation. 
In order for two code sequences to have a diversity order of L, the corresponding paths 
in the code trellis must remerge at least L branches after diverging, and the two paths 
on these L branches must have different labels. This clearly indicates that for L > 1 
parallel transitions have to be excluded. 

Let us consider an (n,k, K) convolutional code as shown in Figure 8.1-1. The 
number of memory elements in this code is Kk, the number of states in the trellis 
representing this code is 2 k<K l! , and 2 k branches enter and leave each state of the 
trellis. Without loss of generality we consider the all-zero path and a path diverging 
from it. The diverging path from the all-zero path corresponds to an input of k bits 
that contains at least one 1. Since the number of memory elements of the code is Kk, 
it takes K sequences of k - bit inputs, all equal to zero, to move the 1 (or Is) out of 
the kK memory units, thus bringing back the code to the all-zero state and remerging 
the path with the all-zero path. This shows that the two paths that have emerged from 
one state can remerge after at least K branches, and hence this code can potentially 
provide a diversity order of K. Therefore, the diversity order that a convolutional code 
can provide is equal to K, the constraint length of the convolutional code. To employ 
this potential diversity order, we need to have enough points in the signal constellation 
to assign different signal points to different branches of the trellis. 

Let us consider the following trellis code studied by Wilson and Leung (1987). The 
trellis diagram and the constellation for this TCM scheme are shown in Figure 14.5-1 
As seen in the figure, the trellis corresponding to this code is a fully connected trellis, 
and there are no parallel branches on it, i.e., each branch of the trellis corresponds to 
a single point in the constellation. The diversity order for this trellis is 2; therefore 
the error probability is inversely proportional to the square of the signal-to-noise-ratio. 
The product distance provided by this code is 1.172. It can be easily verified that the 
squared free Euclidean distance for this code is dj ree = 2.586; therefore the coding 



FIGURE 14.5-1 

A TCM scheme for fading channels. 
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gain of the TCM scheme in Figure 14.5-1, when used for transmission over an AWGN 
channel, is 1.1 dB which is 1 .9 dB inferior to the coding gain of the Ungerboeck code 
of comparable complexity given in Section 8.12. 

In Schlegel and Costello (1989) a class of 8-PSK rate 2/3 TCM codes for various 
constraint lengths is introduced. The search for good codes in this work is done among 
all codes that can be designed by employing a systematic convolutional code followed by 
mapping to the 8-PSK signal constellation. It turns out that the advantage of this design 
procedure is more noticeable at higher constraint lengths. In particular, this design 
approach results in the same codes obtained by Ungerboeck (1983) when the constraint 
length is small. At high constraint lengths these codes are capable of providing both 
higher diversity orders and higher product distances compared to the codes designed 
by Ungerboeck. For example, for a trellis with 1024 states, these codes can provide a 
diversity order of 5 and a (normalized) product distance of 128. For comparison, the 
Ungerboeck code with the same complexity can provide a diversity order of 4 and a 
product distance of 32. 

In Du and Vucetic (1990), Gray coding is employed in the mapping from a convo- 
lutional code output to the signal constellation. An exhaustive search is performed on 
8-PSK TCM schemes, and it is shown that, particularly at lower constraint lengths, these 
codes have a better performance compared to those designed in Schlegel and Costello 
(1989). As the number of states increases, the performance of the codes designed in 
Schlegel and Costello (1989) is better. As an example for a 32-state trellis code, the 
approach of Du and Vucetic (1990) results in a diversity order of 3 and a normalized 
product distance of 32, whereas the corresponding figures for the code designed in 
Schlegel and Costello (1989) are 3 and 16, respectively. 

In Jamali and Le-Ngoc (1991), not only is the design problem of good 4-state 8-PSK 
trellis codes addressed, but also general design rules are formulated for the Rayleigh 
fading channel. These design principles can be viewed as the generalization of the 
design rules formulated in Ungerboeck (1983) for the Gaussian channel. Application 
of these rules results in improved performance. As an example, by applying these rules 
one obtains the signal constellation and the trellis shown in Figure 14.5-2. 



FIGURE 14.5-2 

The improved TCM scheme. 
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It is easy to verify that the coding gain of this code over an AWGN channel (as 
expressed by the free Euclidean distance) is 2 dB, which is 0.9 dB superior to the code 
designed in Wilson and Leung (1987) and shown in Figure 14.5-1, and only 1 dB 
inferior to the Ungerboeck code with a comparable complexity. It is also easy to see 
that the product distance of this code is twice the product distance of the code shown 
in Figure 14.5-1, and therefore the performance of this code over a fading channel is 
superior to the performance of the code designed in Wilson and Leung (1987). Since 
the squared product distance of this code can be shown to be twice the squared product 
distance of the code shown in Figure 14.5-1, the asymptotic performance improvement 
of this code compared to the one designed in Wilson and Leung (1987), when used 
over fading channels, is 10 log >/2 = 1 .5 dB. The encoder for this code can be realized 
by a convolutional encoder followed by a natural mapping to the 8-PSK signal set. 


14.5-2 Multiple Trellis-Coded Modulation (MTCM) 

We have seen that the performance of trellis code modulation schemes on fading chan- 
nels is primarily determined by their diversity order and product distance. In particular, 
we saw that trellises with parallel branches are to be avoided in transmission over fading 
channels due to their low (unity) diversity order. In cases where high bit rates are to 
be transmitted under severe bandwidth restrictions, the signal constellation consists of 
many signal points. In such cases, to avoid parallel paths in the code trellis, the number 
of trellis states should be very large, resulting in a very complex decoding scheme. 

An innovative approach to avoid parallel branches and at the same time to avoid 
a very large number of states is to employ multiple trellis-coded modulation (MTCM) 
as first formulated in Divsalar and Simon (1988c). The block diagram for a multiple 
trellis-coded modulation is shown in Figure 14.5-3. 

In the multiple trellis-coded modulation depicted in Figure 14.5-3, at each in- 
stance of time K = km information bits enter the trellis encoder and are mapped into 
N = nm bits, which correspond to m signals from a signal constellation with a total of 
2" signal points, and these m signals are transmitted over the channel. The important 
fact is that, unlike the standard TCM, here each branch of the trellis is labeled with m 
signals from the constellation and not only one signal. The existence of more than one 



mn bits m signals 


FIGURE 14.5-3 

Block diagram of a multiple trellis-coded modulation scheme. 
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signal corresponding to each trellis branch results in higher diversity order and therefore 
improved performance when used over fading channels. In fact, MTCM schemes can 
have a relatively small number of states and at the same time avoid a reduced diversity 
order. The throughput (or spectral bit rate, defined as the ratio of the bit rate to the 
bandwidth) for this system is k, which is equivalent to an uncoded (and a conventional 
TCM) system. In most implementations of MTCM, the value of n is selected to be 
k + 1. Note that with this choice, the case m = 1 is equivalent to conventional TCM. 
The rate of the MTCM code is R = K /N = k/n. 

In the following example we give a specific TCM scheme and discuss its perfor- 
mance in a fading environment. The signal constellation and the trellis for this example 
are shown in Figure 14.5-4. For this code we assume m = 2, k = 2, and n = 3. 
Therefore, the rate of this code is 2/3, and the trellis selected for the code is a two-state 
trellis. At each instant of time K = km = 4 information bits enter the encoder. This 
means that there are 2 K = 16 branches leaving each state of the trellis. Due to the 
symmetry in the structure of the trellis, there exist eight parallel branches connecting 
any two states of the trellis. The difference, however, with conventional trellis-coded 
modulation is that here we assign two signals in the signal space to each branch of the 
trellis. In fact, corresponding to the K = 4 information bits that enter the encoder, 
N = nm = 6 binary symbols leave the encoder. These six binary symbols are used to 
select two signals from the 8-PSK constellation shown in Figure 14.5-4 (each signal 




FIGURE 14.5-4 

An example of multiple trellis-coded modulation. 
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requires three binary symbols). The mappings of the branches to the binary symbols 
are also shown in Figure 14.5^1. Close examination of the mappings suggested in this 
figure shows that although there exist parallel branches in the trellis for this code, the 
diversity order provided by this code is equal to 2. 

It is seen from the above example that multiple trellis-coded modulation can achieve 
good diversity, which is essential for transmission through the fading channel, without 
requiring complex trellises with a large number of states. It can also be shown (see 
Divsalar and Simon (1988c)), that this same technique can provide all the benefits of 
using the asymmetric signal sets, as described in Divsalar et al. (1987), without the dif- 
ficulties encountered with time jitter and catastrophic trellis codes. Optimum set parti- 
tioning rules for multiple trellis-coded modulation schemes are investigated in Divsalar 
and Simon (1988b) (see also Biglieri et al. (1991)). It is important to note that the signal 
set assignments to the trellis branches shown in Figure 14.5^1 are not the best possible 
signal assignments if this code is to be used over an AWGN channel. In fact, the signal 
set assignment shown in Figure 14.5-5 provides a performance 1 .3 1 5 dB superior to the 
signal set assignment of Figure 14.5-4 when used over an AWGN channel. However, 
obviously the signal assignment of Figure 14.5-5 can only provide a diversity order 
equal to unity as opposed to the diversity order of 2 provided by the signal assignment of 
Figure 14.5^1. This means that on fading channels the performance of the code shown 
in Figure 14.5-4 is superior to the performance of the code shown in Figure 14.5-5. 




FIGURE 14.5-5 

Signal assignment for an MTCM scheme appropriate for transmission over an AWGN channel. 
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In Section 8.12 we have seen that a coded modulation system in which coding and 
modulation are jointly designed as a single entity provides good coding gain over 
Gaussian channels with no expansion in bandwidth. These codes employ labeling by 
set partitioning on the code trellis rather than common labeling techniques such as 
Gray labeling, and these codes achieve their good performance over Gaussian channels 
by providing large Euclidian distance between trellis paths corresponding to differ- 
ent coded sequences. On the other hand, a code has good performance on a fading 
channel if it can provide high diversity order, which depends on the minimum Ham- 
ming distance of the code, as was seen in Section 14.4—1. For a code to have good 
performance under both channel models, it has to provide high Euclidean and high 
Hamming distances. We have previously seen in Chapter 7 that for BPSK and BFSK 
modulation schemes the relation between Euclidean and Hamming distances is a simple 
relation given by Equations 7.2-15 and 7.2-17, respectively. These equations indicate 
that for these modulation schemes Euclidean and Hamming distances are optimized 
simultaneously. 

For coded modulation where expanded signal sets are employed, the relation be- 
tween Euclidean and Hamming distances is not as simple as the corresponding relations 
for BPSK and BFSK. In fact, in many coded modulation schemes, where the perfor- 
mance is optimized through labeling the trellis branches by set partitioning using the 
Ungerboeck’s rules (Ungerboeck (1983)), optimal Euclidean distance, and hence opti- 
mal performance on the AWGN channels model, is achieved with TCM schemes that 
have parallel branches and thus have a Hamming distance, and consequently diversity 
order, equal to unity. These codes obviously cannot perform well on fading channels. 
In Section 14.5 we gave examples of coded modulation schemes designed for fading 
channels that achieve good diversity gain on these channels. The underlying assumption 
in designing these codes was that similar to Ungerboeck’s coded modulation approach, 
the modulation and coding have to be considered as a single entity, and the symbols 
have to be interleaved by a symbol interleaver of depth usually many times the coher- 
ence time of the channel to guarantee maximum diversity. Using symbol interleavers 
results in the diversity order of the code being equal to the minimum number of distinct 
symbols between the codewords; and as we have seen in Section 14.5-1, this can be 
done by eliminating parallel transitions and increasing the constraint length of the code. 
However, there is no guarantee that the codes using this approach perform well when 
transmitted over an AWGN channel model. In this section we introduce a coded mod- 
ulation scheme, called bit-interleaved coded modulation (BICM), that achieves robust 
performance under both fading and AWGN channel models. 

Bit-interleaved coded modulation was first introduced by Zehavi (1992), who in- 
troduced a bit interleaver instead of a symbol interleaver at the output of the channel 
encoder and before the modulator. The idea of introducing a bit interleaver is to make 
the diversity order of the code equal to the minimum number of distinct bits (rather 
than channel symbols) by which two trellis paths differ. Using this scheme results in a 
new soft decision decoding metric for optimal decoding that is different from the metric 
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used in standard coded modulation. A consequence of this approach is that coding and 
modulation can be done separately. Separate coding and modulation results in a system 
that is not optimal in terms of achieving the highest minimum Euclidean distance, and 
therefore the resulting code is not optimal when used on an AWGN channel. However, 
the diversity order provided by these codes is generally higher than the diversity order 
of codes obtained by set partitioned labeling and thus provides improved performance 
over fading channels. A block diagram of a standard TCM system and a bit-interleaved 
coded modulation system are shown in Figure 1 4.6-1 . In both systems a rate 2/3 convo- 
lutional code with an 8-PSK constellation is employed. In the TCM system, the symbol 
outputs of the encoder are interleaved and then modulated using the 8-PSK constellation 
and transmitted over the fading channel, in which p and n denote the fading and noise 
processes. In the BICM system, instead of the symbol interleaver we are using three 
independent bit interleavers that individually interleave the three bit streams. In both 
systems deinterleavers (at symbol and bit level, respectively) are used at the receiver 
to undo the effect of interleaving. Note that the fading process (CSI) is available at the 
receiver in both systems. 

Bit-interleaved coded modulation was extensively studied in Caire et al. (1998). 
This comprehensive study generalized the system introduced by Zehavi (1992), which 
used multiple bit interleavers at the output of the encoder, and instead used a single bit 



FIGURE 14.6-1 

A TCM system (left) and a BICM system (right). [From Zehavi (1992) copyright IEEE.] 
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FIGURE 14.6-2 

The BICM system studied in Caire et al. (1998). [From Caire et al. (1998) copyright IEEE.] 


interleaver that operates on the entire encoder output. The block diagram of the system 
studied in Caire et al. (1998) is shown in Figure 14.6-2. 

The encoder output is applied to to an interleaver denoted by n . The output of the 
interleaver is modulated by the modulator consisting of a label map /x followed by a 
signal set X . The channel model is a state channel with state s which is assumed to 
be a stationary, finite-memory vector channel whose input and output symbols x and 
y are (V-tuples of complex numbers. The state s is independent of the channel input x, 
and conditioned on s, the channel is memoryless, i.e., 

N 

p(y\x,s) = nfCul-U’S,) (14.6-1) 

(=i 

The state sequence s is assumed to be a stationary finite-memory random process; 
i.e., there exists some integer v > 0 such that for all integers r and ,v and all integers 
v < k\ < k ,2 < ■ ■ ■ < k r and j\ < ji < ■•■ < js < 0, the sequences (s^, . . . , Sk , ) and 
(s h , . . . , s j s ) are independent. The integer v represents the maximum memory length 
of the state process. The output of the channel enters the demodulator that computes 
the branch metrics which after deinterleaving are supplied to the decoder for final 
decision. 

Both coded modulation and BICM systems can be described as special cases of 
the block diagram of Figure 14.6-2. A coded modulation system results when the 
encoder is defined over the label alphabet A and A and X c C ,v have the same 
cardinality, i.e., when \A\ = \X\ = M. The labeling map /x : A — >■ X acts on symbol 
interleaved encoder outputs individually. For Ungerboeck codes the encoder is a rate 
k/n convolutional code, and A is the set of binary sequences of length n. The labeling 
function pt is obtained through applying the set partitioning rules to X . 

In BICM, a binary code is employed and its output is bit-interleaved. After inter- 
leaving the bit sequence is broken into subsequences of length n, and each is mapped 
onto a constellation X c of size \X\ = M = 2" using a mapping \i : {0, 1}" — »■ X. 

Let jc e X and let £‘(x) denote the ith bit of the label jc; obviously l‘(x) e {0, 1}. 
We define 


X' b = [x e X : t\x) = b] (14.6-2) 

where X' b denotes the set of all points in the constellation whose label is equal to 
be {0, 1} at position i. It can be easily seen that if P [b — 0] = P [b = 1] = 1/2, then 


p(y\t(x) = b,s) = 2 (m u P(y l-C 

xeX‘ 


(14.6-3) 
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The computation of the bit metrics at the demodulator depends on the availability 
of the channel state information. If CSI is available at the receiver, then the bit metric 
for the / th bit of the symbol at time k is given by the log-likelihood 

b) = log Y p(yk\x, S ) (14.6-4) 

x€X‘ b 

and for the case with no CSI we have 

b ) = log Y P(yk\x ) (14.6-5) 

X€X‘ 

where be {0, 1} and 1 < i < n. In the bit metric calculation for the no CSI case, we 
have 


p(yk\x)= / p(y k \x,s)p(s)ds 


(14.6-6) 


Finally, the decoder uses the ML bit metrics to decode the codeword c e C according 
to 


c = arg max Y c k ) 


(14.6-7) 


ceC 


i=l 


which can be implemented using the Viterbi algorithm. 

A simpler version of bit metrics can be found using the approximation 

lo sE a,- ~ max log at 


(14.6-8) 


which is similar to Equation 8.8-33. With this approximation we have the approximate 
bit metric 


k'Or, b) = 


( max log p(y k \x,s) 


xex: 


I max log p(y k \x) 
VxeX‘ b 


CSI available 
no CSI 


(14.6-9) 


It turns out that BICM performs better when it is used with Gray labeling as 
opposed to labeling induced by the set partitioning rules. The Gray and set partitioning 
labeling for 16-QAM constellation is shown in Figure 14.6-3. Gray labeling is possible 
for certain constellations. For instance, Gray labeling is not possible for a 32-QAM 
constellation. In such cases a quasi-Gray labeling achieves good performance. 

The channel model for BICM, when ideal interleaving is employed, is a set of n 
independent memoryless parallel channels with binary inputs that are connected via a 
random switch to the encoder output. Each channel corresponds to one particular bit 
position from the total n bits. The capacity and the cutoff rate for this channel model 
under the assumption of full CSI at the receiver and no CSI are computed in Caire et al. 
(1998). Figure 14.6^1 shows the cutoff rate for different BICM systems for different 
QAM signaling schemes over AWGN and Rayleigh fading channels. 
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FIGURE 14.6-3 

Set partitioning labeling (a) and Gray labeling (b) for 
16-QAM signaling. [From Caire et al. (1998), copyright 
IEEE.] 
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Comparison of these figures shows that for the AWGN channel the performance of 
coded modulation is superior to the performance of BICM at all signal-to-noise ratios. 
The performance difference is particularly large for larger constellations and lower-rate 
codes. For the Rayleigh fading channel BICM outperforms coded modulation at all rates 
above 1 bit per dimension. The difference in performance is particularly noticeable for 
larger constellations and higher rates. Similar results can be obtained for orthogonal 
signals and noncoherent detection. 

Table 14.6-1 summarizes the performance parameters of various TCM and BICM 
schemes with comparable complexity. It is seen that using BICM generally improves 
the Hamming distance and results in higher diversity order. At the same time BICM 
marginally reduces the Euclidean distance, resulting in performance deterioration on 
AWGN channels. This indicates that BICM is a good candidate for channels with 
variations in the channel model. For instance, Ricean fading channels with varying 
Rice factor operate somewhere between Rayleigh fading and Gaussian channels. For 
these channels BICM is an attractive coding scheme displaying robustness to changes 
in channel characteristics. 

For more details on BICM, the interested reader is referred to Caire et al. (1998), 
Ormeci et al. (2001), Martinez et al. (2006), and Fi and Ritcey (1997, 1998, 1999). 
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FIGURE 14.6-4 

Cutoff rate plots of coded modulation (CM) and BICM for Gray (or quasi-Gray) labeling over 
AWGN (top) and Rayleigh fading channel (bottom). [From Caire et al. (1998), copyright 
IEEE.] 
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■ TABLE 14.6-1 

Upper Bounds to Minimum Euclidean Distance 
and Diversity Order for TCM and BICM for 
16-QAM Signaling. Average Energy is 
Normalized to 1 and Transmission Rate is 3 Bits 
per Complex Dimension. 


Encoder 

memory 

BICM 

TCM 

4 

42 ( C ) 

4 

4 m ( C ) 

2 

1.2 
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1 

3 

1.6 

4 
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1.6 

4 

2.8 

2 
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2 
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2.4 
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Source: From Caire et al. (1998), copyright IEEE. 


■ 14.7 

CODING IN THE FREQUENCY DOMAIN 

Instead of bitwise or symbolwise interleaving in the time domain to increase diversity 
of a coded system and improve the performance over a fading channel, we can achieve 
similar diversity order by spreading the transmitted signal components in the frequency 
domain. A candidate modulation scheme for this case is FSK which can be demodulated 
noncoherently when tracking the channel phase is not possible. 

A model for this communication scheme is shown in Figure 14.3-1 where each 
bit {cjj} is mapped into FSK signal waveforms in the following way. If c (/ = 0, the 
tone foj is transmitted; and if c,y = 1, the tone f \ ; is transmitted. This means that 2 n 
tones or cells are available to transmit the n bits of the codeword, but only n tones are 
transmitted in any signaling interval. 

The demodulator for the received signal separates the signal into 2 n spectral com- 
ponents corresponding to the available tone frequencies at the transmitter. Thus, the 
demodulator can be realized as a bank of In filters, where each filter is matched to 
one of the possible transmitted tones. The outputs of the In filters are detected nonco- 
herently. Since the Rayleigh fading and the additive white Gaussian noises in the 2 n 
frequency cells are mutually statistically independent and identically distributed ran- 
dom processes, the optimum maximum-likelihood soft decision decoding criterion 
requires that these filter responses be square-law-detected and appropriately com- 
bined for each codeword to form the M = 2 k decision variables. The codeword 
corresponding to the maximum of the decision variables is selected. If hard deci- 
sion decoding is employed, the optimum maximum-likelihood decoder selects the 
codeword having the smallest Hamming distance relative to the received codeword. 
Either a block or a convolutional code can be employed as the underlying code in this 
system. 
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14.7-1 Probability of Error for Soft Decision Decoding 
of Linear Binary Block Codes 


Consider the decoding of a linear binary ( n , k) code transmitted over a Rayleigh fad- 
ing channel, as described above. The optimum soft-decision decoder, based on the 
maximum-likelihood criterion, forms the M = 2 k decision variables. 


^=E[a-^)l^l 2 + c,|y u | 2 ] 

j= 1 

n 

= E [I xv I 2 + C V (I ^lil 2 - I yoj\ 2 )] , i = 1 , 2 , .... 2 * 

j= i 


(14.7-1) 


where | y,. ; | 2 , j = 1,2, ... ,n, and r = 0, 1 represent the squared envelopes at the 
outputs of the 2 n biters that are tuned to the 2 n possible transmitted tones. A decision 
is made in favor of the code word corresponding to the largest decision variable of the 
set {£/,}. 

Our objective in this section is the determination of the error rate performance of 
the soft-decision decoder. Toward this end, let us assume that the all-zero code word c i 
is transmitted. The average received signal-to-noise ratio per tone (cell) is denoted by 
y c . The total received SNR for the n tones in ny c and, hence, the average SNR per bit is 

n Yc 

Yb = jYc = E ( 14 -7-2) 

k R, 

where R c is the code rate. 

The decision variable U\ corresponding to the code word C\ is given by 
Equation 14.7-1 with c, ; - = 0 for all j. The probability that a decision is made in 
favor of the m th code word is just 


PiOn) = P(U m > lh) = P(U i - U m < 0) 


= P 


= P 


E(ct; - Cm/) (| yij\ 2 - | y 0 j\ 2 ) < o 

Wm 

J2(\yoj\ 2 -\yij\ 2 ) <o 

;= i 


(14.7-3) 


where w m is the weight of the /nth code word. But the probability in Equation 14.7-3 
is just the probability of error for square-law combining of binary orthogonal FSK with 
w;,„ th-ordcr diversity. That is. 


w m — 1 

P 2 (m) = p Wm 

fc =0 

w m -l 

<p w " E 

*r=0 


w m - 1 + k 
k 

w m — 1 + k 
k 


(i - pt 


2 w m — 1 

Wm 


( 14 . 7 - 4 ) 

(14.7-5) 
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where 


2 + y c 2 + R c Yb 

As an alternative, we may use the Chernov upper bound derived in Section 13.4, which 
in the present notation is 

Pi(m) < [4/7(1 - p)T m (14.7-7) 

The sum of the binary error events over the M — 1 nonzero-weight code words 
gives an upper bound on the probability of error. Thus, 

M 

Pe<J2 ^2(m) (14.7-8) 

m—2 

Since the minimum distance of the linear code is equal to the minimum weight, it 
follows that 


(2 + R c Y b y Wm < (2 + R c YbT d ' 


The use of this relation is conjunction with Equations 14.7-5 and 14.7-8 yields a simple, 
albeit looser, upper bound that may be expressed in the form 


P. < 


/ 2w m — 1 
m=2 V 

(2 + R c Yb) dmk 


(14.7-9) 


This simple bound indicates that the code provides an effective order of diversity equal 
to d mw . An even simpler bound is the union bound 

P e <{M - l)[4p(l - p)\ d """ (14.7-10) 


which is obtained from the Chernov bound given in Equation 14.7-7. 

As an example serving to illustrate the benefits of coding for a Rayleigh fading 
channel, we have plotted in Figure 14.7-1 the performance obtained with the extended 
Golay (24,12) code and the performance of binary FSK and quaternary FSK each with 
dual diversity. Since the extended Golay code requires a total of 48 cells and k = 12, 
the bandwidth expansion factor B e = 4. This is also the bandwidth expansion factor 
for binary and quaternary FSK with L = 2. Thus, the three types of waveforms are 
compared on the basis of the same bandwidth expansion factor. Note that at P b = 1 0 4 , 
the Golay code outperforms quaternary FSK by more than 6 dB, and at P b = 10 5 , the 
difference is approximately 10 dB. 

The reason for the superior performanc of the Golay code is its large minimum 
distance (d mm = 8), which translates into an equivalent eighth-order (L = 8) diversity. 
In contrast, the binary and quaternary FSK signals have only second-order diversity. 
Hence, the code makes more efficient use of the available channel bandwidth. The price 
that we must pay for the superior performance of the code is the increase in decoding 
complexity. 
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FIGURE 14.7-1 

Example of performance obtained with 
conventional diversity versus coding for 
B e = 4. 


14.7-2 Probability of Error for Hard-Decision Decoding 
of Linear Block Codes 

Bounds on the performance obtained with hard-decision decoding of a linear binary 
(/i, k) code have already been given in Section 7.5-2. These bounds are applicable to 
a general binary-input, binary-output memoryless (binary symmetric) channel, and, 
hence, they apply without modification to a Rayleigh fading AWGN channel with 
statistically independent fading of the symbols in the code word. The probability of a 
bit error needed to evaluate these bounds when binary FSK with noncoherent detection 
is used as the modulation and demodulation technique is given by Equation 14.7-6. 

A particularly interesting result is obtained when we use the Chernov upper bound 
on the error probability for hard-decision decoding given by 

P 2 (m) < [4/;(l - p)r m/2 (14.7-11) 

and P e is upper-bounded by Equation 14.7-8. In comparison, the Chernov upper bound 
for P^{m) when soft-decision decoding is employed is given by Equation 14.7-7. We 
observe that the effect of hard-decision decoding is a reduction in the distance between 
any two code words by a factor of 2. When the minimum distance of a code is relatively 
small, the reduction of the distances by a factor of 2 is much more noticeable in a fading 
channel than in a nonfading channel. 

For illustrative purposes we have plotted in Figure 14.7-2 the performance of the 
Golay (23, 12) code when hard-decision and soft-decision decoding are used. The 
difference in performance at P/, = 1 0 is approximately 6 dB. This is a significant 
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FIGURE 14.7-2 

Comparison of performance between 
hard- and soft-decision decoding. 
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difference in performance compared with the 2-dB difference between soft- and hard- 
decision decoding in a nonfading AWGN channel. We also note that the difference 
in performance increases as Pi, decreases. In short, these results indicate the ben- 
efits of soft-decision decoding over hard-decision decoding on a Rayleigh fading 
channel. 


14.7-3 Upper Bounds on the Performance of Convolutional 
Codes for a Rayleigh Fading Channel 

In this subsection, we derive the performance of binary convolutional codes when used 
on a Rayleigh fading AWGN channel. The encoder accepts k binary digits at a time and 
puts out n binary digits at a time. Thus, the code rate is K, = k/n. The binary digits at 
the output of the encoder are transmitted over the Rayleigh fading channel by means of 
binary FSK, which is square-law-detected at the receiver. The decoder for either soft- 
or hard-decision decoding performs maximum-likelihood sequence estimation, which 
is efficiently implemented by means of the Viterbi algorithm. 

First, we consider soft-decision decoding. In this case, the metrics computed in the 
Viterbi algorithm are simply sums of square-law-detected outputs from the demodula- 
tor. Suppose the all-zero sequence is transmitted. Following the procedure outlined in 
Section 8.2-2, it is easily shown that the probability of error in a pairwise comparison 
of the metric corresponding to the all-zero sequence with the metric corresponding to 
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another sequence that merges for the first time at the all-zero state is 



(14.7-12) 


where d is the number of bit positions in which the two sequences differ and p is 
given by Equation 14.7-6. That is, IMd) is just the probability of error for binary 
FSK with square-law detection and c/th-ordcr diversity. Alternatively, we may use the 
Chernov bound in Equation 14.7-7 for Pi(d). In any case, the bit error probability is 
upper-bounded, as shown in Section 8.2-2 by the expression 


where the weighting coefficients { /-(/ } in the summation are obtained from the expansion 
of the first derivative of the transfer function T(Y, Z), given by Equation 8.2-12. 

When hard-decision decoding is performed at the receiver, the bounds on the error 
rate performance for binary convolutional codes derived in Section 8.2-2 apply. That 
is, Pi, is again upper-bounded by the expression in Equation 14.7-13, where /Tri/) is 
defined by Equation 8.2-16 for odd d and by Equation 8.2-17 for even d, or upper- 
bounded (Chernov bound) by Equation 8.2-15, and p is defined by Equation 14.7-6. 

As in the case of block coding, when the respective Chernov bounds are used for 
Piid) with hard-decision and soft-decision decoding, it is interesting to note that the 
effect of hard-decision decoding is to reduce the distances (diversity) by a factor of 
2 relative to soft-decision decoding. 

The following numerical results illustrate the error rate performance of binary, 
rate I / n, maximal free distance convolutional codes for n = 2, 3, and 4 with soft- 
decision Viterbi decoding. First of all, Figure 14.7-3 shows the performance of the rate 
1/2 convolutional codes for constraint lengths 3, 4, and 5. The bandwidth expansion 
factor for binary FSK modulation is B e = 2 n. Since an increase in the constraint 
length results in an increase in the complexity of the decoder to go along with the 
corresponding increase in the minimum free distance, the system designer can weight 
these two factors in the selection of the code. 

Another way to increase the distance without increasing the constraint length of 
the code is to repeat each output bit m times. This is equivalent to reducing the code 
rate by a factor of m or expanding the bandwidth by the same factor. The result is 
a convolutional code that has a minimum free distance of mdf ree , where c/| ree is the 
minimum free distance of the original code without repetitions. Such a code is almost 
as good, from the viewpoint of minimum distance, as a maximum free distance, rate 
1 /mn code. The error rate performance with repetitions is upper-bounded by 



d — c/fr 


(14.7-13) 



(14.7-14) 


where Po(md) is given by Equation 14.7-12. Figure 14.7-4 illustrates the performance 
of the rate 1 /2 codes with repetitions (in = 1 , 2, 3, 4) for constraint length 5. 


Probability of a bit error, P b Probability of a bit error, P b 
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FIGURE 14.7-3 

Performance of rate 1 /2 binary 
convolutional codes with soft-decision 
decoding. 
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FIGURE 14.7-4 

Performance of rate 1/2 m, constraint 
length 5, binary convolutional codes with 
soft-decision decoding. 
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14.7-4 Use of Constant- Weight Codes and Concatenated Codes 
for a Fading Channel 

Our treatment of coding for a Rayleigh channel to this point was based on the use of 
binary FSK as the modulation technique for transmitting each of the binary digits in a 
code word. For this modulation technique, all the 2 k code words in the (n. k) code have 
identical transmitted energy. Furthermore, under the condition that the fading on the n 
transmitted tones is mutually statistically independent and identically distributed, the 
average received signal energy for the M = 2 k possible code words is also identical. 
Consequently, in a soft-decision decoder, the decision is made in favor of the code word 
having the largest decision variable. 

The condition that the received code words have identical average SNR has an 
important ramification in the implementation of the receiver. If the received code words 
do not have identical average SNR, the receiver must provide bias compensation for 
each received code word so as to render it equal energy. In general, the determination 
of the appropriate bias terms is difficult to implement because it requires the estimation 
of the average received signal power; hence, the equal-energy condition on the received 
code words considerably simplifies the receiver processing. 

There is an alternative modulation method for generating equal-energy waveforms 
from code words when the code is constant-weight, i.e., when every code word has 
the same number of Is. Note that such a code is non-linear. Nevertheless, suppose we 
assign a single tone or cell to each bit position of the 2 k code words. Thus, an (72, k) 
binary block code has n tones assigned. Waveforms are constructed by transmitting the 
tone corresponding to a particular bit in a code word if that bit is a 1 ; otherwise, that 
tone is not transmitted for the duration of the interval. This modulation technique for 
transmitting the coded bits is called on-off keying (OOK). Since the code is constant- 
weight, say, w, every coded waveform consists of w transmitted tones that depend on 
the positions of the Is in each of the code words. 

As in FSK, all tones in the OOK signal that are transmitted over the channel are 
assumed to fade independently across the frequency band and in time from one code 
word to another. The received signal envelope for each tone is described statistically 
by the Rayleigh distribution. Statistically independent additive white Gaussian noise is 
assumed to be present in each frequency cell. 

The receiver employs maximum-likelihood (soft-decision) decoding to map the 
received waveform into one of the M possible transmitted code words. For this purpose, 
n matched filters are employed, each matched to one of the n frequency tones. For the 
assumed statistical independence of the signal fading for the n frequency cells and 
additive white Gaussian noise, the envelopes of the matched filter outputs are squared 
and combined to form the M decision variables 


n 



i = 1,2, ...,2 a ' 


(14.7-15) 


where | y ; | 2 corresponds to the squared envelope of the filter corresponding to the / th 
frequency, where j = 1 , 2, . . . , n. 
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It may appear that the constant-weight condition severely restricts our choice 
of codes. This is not the case, however. To illustrate this point, we briefly describe 
some methods for constructing constant-weight codes. This discussion is by no means 
exhaustive. 

Method 1: Non-linear transformation of a linear code In general, if in each word 
of an arbitrary binary code we substitute one binary sequence for every occurrence 
of a 0 and another sequence for each 1 , a constant-weight binary block code will be 
obtained if the two substitution sequences are of equal weights and lengths. If the 
length of the sequence is v and the original code is an (n, k) code, then the resulting 
constant-weight code will be an (vn, k) code. The weight will be n times the weight of 
the substitution sequence, and the minimum distance will be the minimum distances 
of the original code times the distances between the two substitution sequences. Thus, 
the use of complementary sequences when v is even results in a code with minimum 
distance vd mm and weight \ vn. 

The simplest form of this method is the case v = 2, in which every 0 is replaced 
by the pair 01 and every 1 is replaced by the complementary sequence 10 (or vice 
versa). As an example, we take as the initial code the (24,12) extended Golay code. 
The parameters of the original and the resultant constant-weight code are given in 
Table 14.7-1. 

Note that this substitution process can be viewed as a separate encoding. This 
secondary encoding clearly does not alter the information content of a code word — 
it merely changes the form in which it is transmitted. Since the new code word is 
composed of pairs of bits — one “on” and one “off” — the use of OOK transmission of 
this code word produces a waveform that is identical to that obtained by binary FSK 
modulation for the underlying linear code. 

Method 2: Expurgation In this method, we start with an arbitrary binary block 
code and select from it a subset consisting of all words of a certain weight. Several 
different constant-weight codes can be obtained from one initial code by varying the 
choice of the weight w. Since the code words of the resulting expurgated code can 
be viewed as a subset of all possible permutations of any one code word in the set, 
the term binary expurgated permutation modulation (BEXPERM) has been used by 
Gaarder (197 1) to describe such a code. In fact, the constant-weight binary block codes 
constructed by the other methods may also be viewed as BEXPERM codes. This method 


TABLE 14.7-1 

Example of Constant- Weight Code Formed by Method 1 


Code parameters 

Original Golay 

Constant-weight 

n 

24 

48 

k 

12 

12 

M 

4096 

4096 

dmin 

8 

16 

W 

Variable 

24 
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TABLE 14.7-2 

Examples of Constant- Weight Codes Formed by Expurgation 


Parameters 

Original 

Constant weight no. 1 

Constant weight no. 2 

n 

24 

24 

24 

k 

12 

9 

11 

M 

4096 

759 

2576 

^min 

8 

>8 

>8 

U) 

Variable 

8 

12 


of generating constant-weight codes is in a sense opposite to the first method in that 
the word length n is held constant and the code size M is changed. The minimum 
distance for the constant-weight subset will clearly be no less than that of the original 
code. As an example, we consider the Golay (24, 12) code and form the two different 
constant-weight codes shown in Table 14.7-2. 

Method 3: Hadamard matrices This method might appear to form a constant- 
weight binary block code directly, but it actually is a special case of the method 
of expurgation. In this method, a Hadamard matrix is formed as described in Sec- 
tion 7.3-5, and a constant-weight code is created by selection of rows (code words) 
from this matrix. Recall that a Hadamard matrix is an n x n matrix (n even integer) 
of Is and Os with the property that any row differs from any other row in exactly \n 
positions. One row of the matrix is normally chosen as being all Os. 

In each of the other rows, half of the elements are Os and the other half Is. A 
Hadamard code of size 2 {n — 1) code words is obtained by selecting these n — 1 rows 
and their complements. By selecting M = 2 k < 2(n — 1) of these code words, we 
obtain a Hadamard code, which we denote by H(n, k ), where each code word conveys 
k information bits. The resulting code has constant weight \n and minimum distance 
d ~ — —n 

w min — 2 n ' 

Since n frequency cells are used to transmit k information bits, the bandwidth 
expansion factor for the Hadamard H(n,k) code is defined as 

n 

B e = - cells per information bit 
k 

which is simply the reciprocal of the code rate. Also, the average SNR per bit, denoted 
by fb, is related to the average SNR per cell, y c , by the expression 

k k 2 Yb 

Yc = —Yb = 2 -? b = 2 R c y b = (14.7-16) 

n B e 

Let us compare the performance of the constant-weight Hadamard codes under 
a fixed bandwidth constraint with a conventional M- ary orthogonal set of waveforms 
where each waveform has diversity L. The M orthogonal waveforms with diversity are 
equivalent to a block orthogonal code having a block length n = LM and k = log, M. 
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For example, if M = 4 and L = 2, the code words of the block orthogonal code are 


C\ = 

[1 

1 

0 

0 

0 

0 

0 

0] 

Cl = 

[0 

0 

1 

1 

0 

0 

0 

0] 

c 3 = 

[0 

0 

0 

0 

1 

1 

0 

0] 

C 4 = 

[0 

0 

0 

0 

0 

0 

1 

1] 


To transmit these code words using OOK modulation requires n = 8 cells, and since 
each code word conveys k = 2 bits of information, the bandwidth expansion factor 
B e = 4. In general, we denote the block orthogonal code as O (// , k). The bandwidth 
expansion factor is 


B e = 



LM 

IT 


(14.7-17) 


Also, the SNR per bit is related to the SNR per cell by the expression 


k fk\ Vh 

Yc= T y b = M (-)Yb= M r -l (14.7-18) 

L \nj B e 

Now we turn our attention to the performance characteristics of these codes. First, 
the exact probability of a code word (symbol) error for M -ary orthogonal signaling 
over a Rayleigh fading channel with diversity was given in closed form in Section 13.4. 
As previously indicated, this expression is rather cumbersome to evaluate, especially 
if either L or M or both are large. Instead, we shall use a union bound that is very 
convenient. That is, for a set of M orthogonal waveforms, the probability of a symbol 
error can be upper-bounded as 


Pe<(M- 1 )P 2 (L) 

= (2 k - 1 )P 2 (L) < 2 k P 2 (L) 


(14.7-19) 


where Pi(L), the probability of error for two orthogonal waveforms, each with diversity 
L, is given by Equation 14.7-12 with p = 1/(2 + y c ). The probability of bit error is 
obtained by multiplying P e by 2 k ~ 1 /(2 k — 1), as explained previously. 

A simple upper (union) bound on the probability of a code word error for the 
Hadamard H(n, k) code is obtained by noting the probability of error in deciding 
between the transmitted code word and any other code word is bounded from above by 
I J 2 Q<imin) , where <7 mm is the minimum distance of the code. Therefore, an upper bound 
on P e is 

Pe<(M- 1 )P 2 (\d mu -) < 2 k P 2 (y mm ) (14.7-20) 

Thus the “effective order of diversity” of the code for OOK modulation is \d m 
The bit error probability may be approximated as \P e , or slightly overbounded by 
multiplying P e by the factor 2 k ~ l /{2 k — 1), which is the factor used above for or- 
thogonal codes. The latter was selected for the error probability computations given 
below. 

Figure 14.7-5 illustrates the error rate performance of a selected number of 
Hadamard codes for several bandwidth expansion factors. The advantage resulting 
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FIGURE 14.7-5 

Performance of Hadamard codes. 


from an increase in the size M of the alphabet (or k, since k = log 2 M) and an increase 
in the bandwidth expansion factor is apparent from observation of these curves. Note, 
for example, that the H( 20, 5) code when repeated twice results in a code that is de- 
noted by 2 / 7 ( 20 , 5) and has a bandwidth expansion factor B ( , = 8 . Figure 14.7-6 shows 
the performance of the Hadamard and block orthogonal codes compared on the basis 
of equal bandwidth expansion factors. It is observed that the error rate curves for the 
Hadamard codes are steeper than the corresponding curves for the block orthogonal 
codes. This characteristic behavior is due simply to the fact that, for the same bandwidth 
expansion factor, the Hadamard codes provide more diversity than block orthogonal 
codes. Alternatively, one may say that Hadamard codes provide better bandwidth effi- 
ciency than block orthogonal codes. It should be mentioned, however, that at low SNR, 
a lower-diversity code outperforms a higher-diversity code as a consequence of the fact 
that, on a Rayleigh fading channel, there is an optimum distribution of the total received 
SNR among the diversity signals. Therefore, the curves for the block orthogonal codes 
will cross over the curves for the Hadamard codes at the low-SNR (high-error-rate) 
region. 

Method 4: Concatenation In this method, we begin with two codes: one binary 
and the other nonbinary. The binary code is the inner code and is an ( n , k) constant- 
weight (non-linear) block code. The nonbinary code, which may be linear, is the outer 
code. To distinguish it from the inner code, we use uppercase letters, e.g., an (N . K) 
code, where N and K are measured in terms of symbols from a < 7 -ary alphabet. The 
size q of the alphabet over which the outer code is defined cannot be greater than the 
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FIGURE 14.7-6 

Comparison of performance between 
Hadamard codes and block orthogonal 
codes. 


number of words in the inner code. The outer code, when defined in terms of the binary 
inner code words rather than q -ary symbols, is the new code. 

An important special case is obtained when q = 2 k and the inner code size is 
chosen to be 2 k . Then the number of words is M = 2 kK and the concatenated structure 
is an (nN, kK) code. The bandwidth expansion factor of this concatenated code is the 
product of the bandwidth expansions for the inner and outer codes. 

Now we shall demonstrate the performance advantages obtained on a Rayleigh 
fading channel by means of code concatenation. Specifically, we construct a concate- 
nated code in which the outer code is a dual -A: (nonbinary) convolutional code and the 
inner code is either a Hadamard code or a block orthogonal code. That is, we view the 
dual-/: code with M - ary (M = 2 k ) orthogonal signals for modulation as a concatenated 
code. In all cases to be considered, soft-decision demodulation and Viterbi decoding 
are assumed. 

The error rate performance of the dual-/' convolutional codes is obtained from the 
derivation of the transfer function given by Equation 8.7-2. For a rate- 1 /2, dual-/: code 
with no repetitions, the bit error probability, appropriate for the case in which each k-bit 
output symbol from the dual-/: encoder is mapped into one of M = 2 k orthogonal code 
words, is upper-bounded as 


2*-i 00 

Pb < yrrj E p 2(«o (14.7-21) 

m= 4 


where P 2 (m) is given by Equation 14.7-12. 
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For example, a rate-1/2, dual-2 code may employ a 4-ary orthogonal code 0(4, 2) 
as the inner code. The bandwidth expansion factor of the resulting concatenated code 
is, of course, the product of the bandwidth expansion factors of the inner and outer 
codes. Thus, in this example, the rate of the outer code is 1/2 and the inner code is 1 /2. 
Hence, B e = (4/2)(2) = 4. 

Note that if every symbol of the dual-A is repeated r times, this is equivalent to 
using an orthogonal code with diversity L = r. If we select r = 2 in the example 
given above, the resulting orthogonal code is denoted as 0(8, 2) and the bandwidth 
expansion factor for the rate- 1/2, dual-2 code becomes B e = 8. Consequently, the term 
Piim) in Equation 14.7-21 must be replaced by fMniL) when the orthogonal code 
has diversity L. Since a Hadamard code has an “effective diversity” \ d mm , it follows 
that when a Hadamard code is used as the inner code with a dual-A outer code, the 
upper bound on the bit error probability of the resulting concatenated code given by 
Equation 14.7-21 still applies if P 2 (m) is replaced by P 2 (\md mm ). With these modi- 
fications, the upper bound on the bit error probability given by Equation 14.7-21 has 
been evaluated for rate- 1/2, dual-A convolutional codes with either Hadamard codes 
or block orthogonal codes as inner codes. Thus the resulting concatenated code has a 
bandwidth exansion factor equal to twice the bandwidth expansion factor of the inner 
code. 

First, we consider the performance gains due to code concatenation. Figure 14.7-7 
illustrates the performance of dual-A codes with block orthogonal inner codes compared 
with the performance of block orthogonal codes for bandwidth expansion factors B e = 
4, 8, 16, and 32. The performance gains due to concatenation are very impressive. 



SNR per bit, y b (dB) 


FIGURE 14.7-7 

Comparison of performance between 
block orthogonal codes and dual-A with 
block orthogonal inner codes. 
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FIGURE 14.7-8 

Comparison of performance between 
Hadamard codes and dual-A codes with 
Hadamard inner codes. 


For example, at an error rate of 1 0 6 and B e = 8, the dual -A: code outperforms the 
orthogonal block code by 7.5 dB. In short, this gain may be attributed to the increased 
diversity (increase in minimum distance) obtained via code concatenation. Similarly, 
Figure 14.7-8 illustrates the performance of two dual -A' codes with Hadamard inner 
codes compared with the performance of the Hadamard codes alone for B e = 8 and 12. 
It is observed that the performance gains due to code concatenation are still significant, 
but certainly not as impressive as those illustrated in Figure 14.6-8. The reason is that 
the Hadamard codes alone yield a large diversity, so that the increased diversity arising 
from concatenation does not result in as large a gain in performance for the range of 
error rates covered in Figure 14.7-8. 

The numerical results given above illustrate the performance advantages in using 
codes with good distance properties and soft-decision decoding on a Rayleigh fading 
channel as an alternative to conventional M - ary orthogonal signaling with diversity. 
In addition, the results illustrate the benefits of code concatenation on such a channel, 
using a dual-A convolutional code as the outer code and either a Hadamard code or a 
block orthogonal code as the inner code. Although dual-A codes were used for the outer 
code, similar results are obtained when a Reed-Solomon code is used for the outer 
code. There is an even greater choice in the selection of the inner code. 

The important parameter in the selection of both the outer and the inner codes 
is the minimum distance of the resultant concatenated code required to achieve a 
specified level of performance. Since many codes will meet the performance require- 
ments, the ultimate choice is made on the basis of decoding complexity and bandwidth 
requirements. 
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■ 14.8 

THE CHANNEL CUTOFF RATE FOR FADING CHANNELS 


We studied the notion and significance of the channel cutoff rate for the general class 
of memoryless channels in Section 6.8. In the same section we obtained expressions 
for the channel cutoff rate for the special cases of a BSC channel and a binary-input, 
continuous-output Gaussian channel. In this section we extend those results to the case 
of fully interleaved Ricean and Rayleigh fading channels for the cases where CSI is 
available at the receiver. 

We have seen in Section 6.8 that for a general memoryless channel the cutoff rate 
can be expressed by Equation 6.8-20 as 


R 0 = max sup R 0 (p, A) 

pW A>0 


= max sup 

Pto A>0 


-log 2 




(14.8-1) 


where for a symmetric channel model the maximum is achieved for A = i.e., by 

substituting the Chernov bound by the Bhattacharyya bound, or substituting Xi by 
A XIiX2 - The values of and A XUXl are given by Equation 6.8-10 as 


A^ X2 = Y,p\y\*2)p l -\y\xi) 
A X \,X 2 = X V/H.v 1*1 )p(y \xi) 


(14.8-2) 


where the summation on y corresponds to a discrete-output channel, which should be 
substituted by integration over the output space for a continuous-output channel. The 
expectation in Equation 14.8-1 is over all independent input distributions, i.e., 


E 



x 2 


EE p(x])p(x 2 )A ( '- ) _ 


(14.8-3) 


where for continuous-input channels the summations are substituted by integrals. 


14.8-1 Channel Cutoff Rate for Fully Interleaved Fading 
Channels with CSI at Receiver 

For this channel model, ideal interleaving causes the channel model to be memoryless. 
The availability of CSI at the receiver can be interpreted as extending the channel 
output to be both the regular channel output y and the fading information. The channel 
is described as a memoryless model in which 


Vi = TiX, 


(14.8-4) 
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where r, denotes the iid fading process and //, is the iid noise process, which is assumed 
to be distributed according to CM (0, Mi) and is independent of the fading process. The 
channel inputs are assumed to be points in a complex constellation. For a Rayleigh 
fading channel the r,’s are iid drawn according to CM {0,2a 2 ). Since channel state 
information is available at the decoder, we can consider the pair ( v,-, r ; ) as the channel 
output. Therefore for this channel model P [output | input] can be written as 

P(r, y\x) = p(r)p(y\r, x) (14.8-5) 


Since the channel model is symmetric, we use the Bhattacharyya bound and from 
Equation 14.8-2 we obtain 


A.XI.X2 ~ 


V P(y l*i, r )p{y\xi, r)dy 

0 IJ-oo 

= E / p{y\x l )p{y\xi,r)dy 


p{r) dr 


(14.8-6) 


where the expectation is taken with respect to the random variable R. For the channel 
model of Equation 14.8-4 we have 


p(y\x, r) = — —e ' "o' (14.8-7) 

nNo 


Using Equation 14.8-7 after completing the square in the exponent and some manipu- 
lation, we obtain 



\Zp{y\xi)p(y\xi, r)dy 




(14.8-8) 


or 


Ajcj ,jc 2 = E 


iN 0 


(14.8-9) 


where d\x = \x\ — JC 2 I- Defining 


we obtain 


«12 = 


4. 

4iV 0 


A Xl , Xl = E 


-om\r\ 2 


(14.8-10) 


(14.8-11) 


In other words, is equal to 0| S |2(t), the moment generating function of the random 

variable R\ 2 , i.e., the squared envelope of the fading process, when t is substituted 
with —an. 

For a Ricean fading channel | R \ has a Ricean distribution and A'| 2 has a noncentral 
X 2 PDF with two degrees of freedom and parameters .v and a 2 . From Table 2.3-3 we 
obtain the characteristic function of |R| 2 , and from it we obtain 


Axi,X2 


1 


1 + 2a n (x 2 


^ l+2ai2<y z 


(14.8-12) 
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By substituting the terms A = s 1 2 + 2a 2 and K = 


2 

in Equation 14.8-12, we have 


A 


Xl,x 2 


K + 1 AKa u 

g K+\+Aa\2 

K + 1 + Aa n 


(14.8-13) 


Note that A = E[|R| 2 ] represents the average power gain of the channel. If we assume 
that A = 1 , the transmitted and received powers become equal. For this case 


^ xi,x2 — 


K + 1 


g K+\+a\2 


K + 1 + Q!i2 

For a Rayleigh fading channel we have s = K = 0 and 

1 


A*1,JC2 — 


1 + “12 


(14.8-14) 


(14.8-15) 


Note that in all cases studied above, if x\ = X 2 , then 042 = 0 and A 12 = 1. 

For a BPS K modulation system the optimal p(x) to achieve Rq is a uniform distribu- 
tion. To compute Rq, we need to find E [ A*, , x 2 ] • For a uniform distribution on the inputs 
±JE~ S , the probability of X\ = X 2 is and the probability of X \ 7^ A 2 is also For 
this latter case dp = 4 S s , and from Equation 14.8-10 we obtain an = £ s /Nq = SNR. 
Therefore, 


where 


and finally 


E[A Xu x 2 \ = T + ^ A = 


A + 1 
2 


A = 


K + 1 /(SNR 

£ tf+i+SNR 

K + 1 + SNR 


Rq = - log 2 


A + 1 
2 


= 1 - log 2 



K + l 

K + 1 + SNR 


_ A - SNR 
£ tf+l+SNR 


(14.8-16) 


(14.8-17) 


(14.8-18) 


For the case of a Rayleigh fading channel, this relation reduces to 

Rq = 1 — log, | 1 + ) (14.8-19) 

5 - V 1 + SNR ) 


For QPSK signaling the optimal input probability distribution is a uniform distri- 
bution. In this case, d\ 2 = 0, or 2£ s , or 4£ s with probabilities 2 , and respectively. 

SNR 

The corresponding values of an are 0, , , and SNR, respectively. Substituting these 

values into Equation 14.8-14, we obtain 

1 1 / SNR\ 1 

E [A] = - + -g [ — J + -g(SNR) (14.8-20) 
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FIGURE 14.8-1 

The cutoff rate versus SNR for BPSK and QPSK over a Rayleigh fading channel. 


where 


g(u) = 


K+ 1 


g -Ka/(K+l+ct) 


(14.8-21) 


K+ 1 +a 

The Rayleigh fading case is obtained by putting K = 0 in Equation 14.8-21. The 


result is 

E [A] = 

Finally Rq is obtained using 


(SNR) 2 + 8SNR + 8 
4(SNR + 2)(SNR + 1) 

Ro = - log 2 E [A] 


(14.8-22) 

(14.8-23) 


where E [A] is obtained from Equations 14.8-20 and 14.8-22. Plots of Rq versus 
SNR = £ s / N i} for BPSK and QPSK in the case of a Rayleigh fading channel are shown 
in Figure 14.8-1. 


■ 14.9 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

A comprehensive treatment of channel modeling, signaling, capacity issues, and coding 
techniques for fading channels can be found in Biglieri et al. (1998b). This paper 
summarizes and unifies the main results available on fading channel modeling, capacity, 
and coding up to 1998 and includes many references. Channel capacity for finite- 
state channels with different assumptions on the availability of state information are 


Chapter Fourteen: Fading Channels II: Capacity and Coding 


961 


considered in Shannon (1958), Wolfowitz (1978), Salehi (1992), Cover and Chiang 
(2002), Goldsmith and Varaiya (1997), Goldsmith and Varaiya (1996), Abou-Faycal 
et al. (2001), and Ozarow et al. (1994). 

Trellis-coded modulation for fading channels has been extensively treated in the 
books by Biglieri et al. (1991) and Jamali and Le-Ngoc (1994) as well as in the papers 
by Divsalar Simon (1988a, b, c), Sundberg and Seshadri (1993) and Salehi and Proakis 
(1995). Coding for fading channels is also the subject of the book by Biglieri (2005) 
where both coding and capacity issues under different assumptions have been treated. 
The book by ?) also covers capacity and coding issues for wireless channels with 
emphasis on multiantenna systems. 

Bit-interleaved coded modulation introduced by Zehavi (1992) has been treated 
extensively in the paper by Caire et al. (1998). Other papers studying different aspects 
of this technique including error performance, iterative decoding, and optimal labeling 
under iterative decoding include the works of Ormeci et al. (2001), Martinez et al. 
(2006), and Li and Ritcey (1997, 1998, 1999). 

The use of dual-k codes with M - ary orthogonal FSK was proposed in publications 
by Viterbi and Jacobs (1975) and Odenwalder (1976). The importance of coding for 
digital communications over a fading channel was also emphasized in a paper by Chase 
(1976). The benefits derived from concatenated coding with soft decision decoding for 
a fading channel were demonstrated by Pieper et al. (1978). The performance of dual -A' 
codes with either block orthogonal codes or Hadamard codes as inner codes was in- 
vestigated by Proakis and Rahman (1979). The error rate performance of maximal 
free-distance binary convolutional codes was evaluated by Rahman (1981). 


PROBLEMS 

14.1 Channels 1 and 2 are both continuous-time additive Gaussian noise channels described 
by Ti(f) = Jfi(t) + Zi(f) and TTO = 2G(0 + Z 26 ), respectively. Zi(t) and ZiXt') are 
the noise processes of the channels. It is assumed that Zi(f) and Zoit) are zero-mean, 
independent Gaussian processes with power spectral densities N\(f) and Ni(f) W/Hz, 
as shown in Figure P14.1. It is assumed that each channel has an input power constraint 
of 10 mW. 

1. Determine C\ and C 2 , the capacities of the two channels (in bits per second). 

2. If a binary memoryless source with P(U = 0) = 1 —P(U = 1) = 0.4 which generates 
7500 symbols per second is to be transmitted once via channel 1 and once via channel 2, 
determine in each case the absolute minimum achievable error probability. 

3 . Now consider the two channel configurations shown in Figure P 14. 1 . The first configu- 
ration is simply a concatenation of the two original channels. The second concatenation 
allows a processor with arbitrary complexity to be used between the two channels. In 
each case determine the absolute minimum achievable error probability for the binary 
source of part 2 when transmitted over the given channel configuration. 

4. What is the capacity of channel 1 if the input power constraint is increased from 10 
to 100 mW? 


962 


Digital Communications 


W) 



/(kHz) 


N 2 (f) 



/(kHz) 



Configuration 1 



Configuration 2 


FIGURE P14.1 


14.2 Consider the channel model shown in Figure 14.2-1 and assume both channel components 
are BSC channels with crossover probability p = \. 

1. What is the ergodic capacity of this channel? 

2. Now assume that the transmitter can control the state of the channel and the receiver 
has access to channel state information. What is the capacity of the resulting channel? 

14.3 Using Equation 14.1-19, determine the capacity of a finite-state channel in which state 
information is only available at the receiver. 

14.4 Using Equation 14.1-19, determine the capacity of a finite-state channel in which the 
same state information is available at the transmitter and the receiver. 
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14.5 Consider a BSC in which the channel can be in three states. In state 5 = 0 the output 
of the channel is always 0, regardless of the channel input. In state 5 = 1 , the output is 
always 1, again regardless of the channel input. In state 5 = 2 the channel in noiseless, 

i.e., the output is always equal to the input. We assume that P(S = 0) = P(S = 1) = |. 

1. Determine the capacity of this channel, assuming no state information is available to 
the transmitter or the receiver. 

2. Determine the capacity of the channel, assuming that channel state information 5 is 
available at both sides. 

14.6 In Problem 14.5 assume that the same noisy versions of state information are available 
at both sides; i.e., Z = U = V is available where Z is a binary-valued random variable 
with 

P[Z = 0|5 = 0] =P[Z = 1 15 = 1] = 1 

1 

P[Z = 0|5 = 2] = P[Z= 1 15 = 2] = - 
Determine the capacity of this channel. 

14.7 Consider the channel model shown in Figure 14.2-1. Assume that the top channel is a 
noiseless BSC channel for which crossover probability is zero and the bottom channel 
is a binary-input binary-output Z channel with P[F = 1 |X = 1] = 1 and P[7 = 
0 | X = 0 ] = j . The channel switches between the two states independently for each 
transmission, and the two states are equiprobable. 

1 . Determine the ergodic capacity of this channel when no state information is available. 

2. Determine the ergodic capacity of the channel when perfect state information is avail- 
able at both sides. 

3. Determine the ergodic capacity of the channel when perfect state information is avail- 
able at the receiver. 

14.8 Prove that Equation 14.2-1 1 can be simplified in the form of Equation 14.2-13. 

14.9 In Figure 14.4-1, determine the optimal rotation that maximizes the coding gain. What 
is the resulting coding gain? 

14.10 A fading channel model that is flat in both time and frequency can be modeled as y = 
Rx + n, where the fading factor R remains constant for the entire duration of the trans- 
mission of the codeword. Determine the optimal decision rule for this channel for Ricean 
fading when the state information is available at the receiver and when it is not available. 

14.11 The outage probability of a diversity combiner is defined as the percentage of time the 
instantaneous output SNR of the combiner is below some prescribed level for a specified 
number of diversity branches. Consider a communication system that employs multiple 
receiver antennas to achieve diversity in a Rayleigh fading channel. Suppose that selection 
diversity is used with N r receiver antennas. If the average SNR is 20 dB, determine the 
probability that the instantaneous SNR drops below 10 dB when 

1. N r = 1 

2. N r = 2 

3. N r = 4 
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14.12 The Gauss-Markov model for a time-varying channel is given by 

him + 1) = vT — ah(m ) + aw{m + 1) 

where {w(m)} is a sequence of iid CJV(0 , 1) random variables independent of h{ 0) ~ 
CATO. 1). The sampling time is T s . The coherence time of this channel is controlled by 
the choice of parameter a. 

1. Calculate the autocorrelation function of the sequence {h(m)} denoted by R/,(m). 

2. Define coherence time as that corresponding to Rh(m) = 0.5. Determine the value of 
a in terms of T s and the coherence time T c . 

3. Suppose that {h(m} is transmitted from the receiver to the transmitter with a delay 
of T s . The transmitter predicts the value of him), say hint), from the past samples 
h(m — n ) and h(in — n — 1). Thus 

h{m) = b\h(m — n) + bjhim — n — 1) 

where the prediction coefficients b\ and £>2 are determined to minimize the MSE 
E [|e| 2 ] = E [| h(m) - h(m )\ 2 ] 

Determine b\ and b-± that minimize MSE. 

14.13 The rate 1/3, K = 3, binary convolutional code with transfer function given by Equa- 
tion 8.1-21 is used for transmitting data over a Rayleigh fading channel via binary PSK. 

1. Determine and plot the probability of error for hard decision decoding. Assume that 
the transmitted waveforms corresponding to the coded bits fade independently. 

2. Determine and plot the probability of error for soft decision decoding. Assume that 
the waveforms corresponding to the coded bits fade independently. 

14.14 Show that the pairwise error probability for a fully interleaved Rayleigh fading channel 
with fading process R, can be bounded by 


R?\xj — £,• | 2 ' 

e~ ' 4 *o 


where the expectation is taken with respect to R t ’s. From above conclude the following 
bound on the pairwise error probability. 



Px^st < 


n T 


1 

+ \Xi - Xi\ 2 /ANo 


14.15 Determine the product distance and the free Euclidean distance of the coded modulation 
scheme shown in Figure 14.5-1. 
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14.16 Determine the product distance and the free Euclidean distance of the coded modulation 
scheme shown in Figure 14.5-2. 

14.17 Show that the signal set assignment of Figure 14.5-5 provides a performance 1.315 dB 
superior to the signal set assignment of Figure 14.5-4 when used over an AWGN channel. 

14.18 In Figure 14.6-3 show X' b for £> = 0,1 and for 1 < i < 4 for both set partitioning labeling 
and Gray labeling. 



Multiple-Antenna Systems 


Th e use of multiple antennas at the receiver of a communication system is a standard 
method for achieving spatial diversity to combat fading without expanding the band- 
width of the transmitted signal. Spatial diversity can also be achieved by using multiple 
antennas at the transmitter. For example, it is possible to achieve dual diversity with two 
transmitting antennas and one receiving antenna, as we demonstrate in this chapter. We 
will also demonstrate that multiple transmitting antennas can be used to create multiple 
spatial channels and thus provide the capability to increase the data rate of a wireless 
communication system. This method is called spatial multiplexing. 


■ 15.1 

CHANNEL MODELS FOR MULTIPLE-ANTENNA SYSTEMS 

A communication system employing Nj transmitting antennas and Nr receiving an- 
tennas is generally called a multiple-input, multiple-output (MIMO) system, and the 
resulting spatial channel in such a system is called a MIMO channel. The special case 
in which N r = N R = 1 is called a single-input, single-output (SISO) system, and the 
corresponding channel is called a SISO channel. A second special case is one in which 
Nr = 1 and N R > 2. The resulting system is called a single-input, multiple-output 
(SIMO) system, and the corresponding channel is called a SIMO channel. Finally, a 
third special case is one in which Nr > 2 and Nr = 1 . The resulting system is called a 
multiple-input, single-output (MISO) system, and the corresponding channel is called 
a MISO channel. 

In a MIMO system with Nr transmit antennas and Nr receive antennas, we denote 
the equivalent lowpass channel impulse response between the j th transmit antenna and 
the / th receive antenna as /i, ; (r ; /), where r is the age or delay variable and t is the time 
variable. t Thus, the randomly time-varying channel is characterized by the Nr x Nj 


tFor convenience, the subscript on lowpass equivalent signals is omitted throughout this chapter. 
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matrix H( r; t), defined as 


H( t; 0 = 


h n(r; /) 
^2i(r; 0 


hn(r; t ) 
/J22(t; /) 


fi;v K i(r;f) h NR 2 (r;t ) 


/ttiv r (t; 0 
hiN T {t\ t ) 

hN R N T ( t; t) 


(15.1-1) 


Suppose that the signal transmitted from the /th transmit antenna is .v ; - (r), j = 
1,2, . . . , Nt- Then the signal received at the /th antenna in the absence of noise may 
be expressed as 


Nt i> oo 

n(t) = X / - T)dr 

y= l-7-oo 
N t 

= X h ij( r '’ 0 * •S/O)’ j = 1, 2, . . . , /Vfl 
7=1 


(15.1-2) 


where the asterisk denotes convolution. In matrix notation, Equation 15.1-2 is 
expressed as 


r(t ) = H( r; /) * s(r) 


(15.1-3) 


where s(t) is an /V 7 x 1 vector and r(t) is an N R x 1 vector. 

For a frequency-nonselective channel, the channel matrix H is expressed as 


~hn(t) 


h\N T (t) 

h 2 i(t) 


hiN T (t) 


hN R 2(t ) • 

1 

£ 


In this case, the signal received at the /th antenna is simply 

N t 

r i(t) = X *i/(0S/-(0. i = 1.2 ,...,N r 

7=1 

and, in matrix form, the received signal vector /•(/) is given as 

r(t) = H(t)s(t ) 


(15.1-4) 


(15.1-5) 


(15.1-6) 


Furthermore, if the time variations of the channel impulse response are very slow within 
a time interval 0 < t < T, when T may be either the symbol interval or some general 
time interval, Equation 15.1-6 may be simply expressed as 

/•(/) = Hs(t), 0 <t<T (15.1-7) 


where H is constant within the time interval 0 < t < T . 

The slowly time- variant frequency-nonselective channel model embodied in Equa- 
tion 15.1-7 is the simplest model for signal transmission in a MIMO channel. In the 
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following two subsections, we employ this model to illustrate the performance charac- 
teristics of MIMO systems. At this point, we assume that the data to be transmitted are 
uncoded. Coding for MIMO channels is treated in Section 15.4. 


15.1-1 Signal Transmission Through a Slow Fading 
Frequency-Nonselective MIMO Channel 

Consider a wireless communication system that employs multiple transmitting and 
receiving antennas, as shown in Figure 15.1-1. We assume that there are Nt transmitting 
antennas and Nr receiving antennas. As illustrated in Figure 15.1-1, a block of Nj 
symbols is converted from serial to parallel, and each symbol is fed to one of N T identical 
modulators, where each modulator is connected to a spatially separate antenna. Thus, 
the N t symbols are transmitted in parallel and are received on N R spatially separated 
receiving antennas. 

In this section, we assume that each signal from a transmitting antenna to a receiving 
antenna undergoes frequency-nonselective Rayleigh fading. We also assume that the 
differences in propagation times of the signals from the Nj transmitting to the N R 
receiving antennas are small relative to the symbol duration T, so that for all practical 
purposes, the signals from the Nj transmitting antennas to any receiving antenna are 
synchronous. Hence, we can represent the equivalent lowpass received signals at the 
receiving antennas in a signaling interval as 


N t 

r m (t) = Y2s n h mn g(t) + Z m (t), o <t<T, m = 1,2,..., N R (15.1-8) 

n = 1 



(a) Transmitter 



(b) Receiver 


FIGURE 15.1-1 

A communication system with multiple transmitting and receiving antennas. 
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where g(t ) is the pulse shape (impulse response) of the modulation filters; h mn is the 
complex- valued, circular zero-mean Gaussian channel gain between the nth transmit- 
ting antenna and the mth receiving antenna; s n is the symbol transmitted on the nth 
antenna; and z m (t) is a sample function of an AWGN process. The channel gains {h mn } 
are identically distributed and statistically independent from channel to channel. The 
Gaussian sample functions {z m (t)} are identically distributed and mutually statistically 
independent, each having zero mean and two-sided power spectral density 2Nq. The 
information symbols { s n } are drawn from either a binary or an M- ary PSK or QAM 
signal constellation. 

The demodulator for the signal at each of the N R receiving antennas consists of 
a matched filter to the pulse g(t), whose output is sampled at the end of each symbol 
interval. The output of the demodulator corresponding to the mth receiving antenna can 
be represented as 

n t 

y,n ^ ' Sn^mn T hmi tt? — 1,2,..., Nr (15.1 9) 

n= 1 

where the energy of the signal pulse g(t) is normalized to unity and rj m is the additive 
Gaussian noise component. The Nr soft outputs from the demodulators are passed to 
the signal detector. For mathematical convenience, Equation 15.1-9 may be expressed 
in matrix form as 

y = Hs + ri (15.1-10) 

where y = [yiy 2 ... Jvj', s = [sis 2 ... s Nt Y, V = [rn ij 2 ... rj NR Y, and H is the 
Nr x N'i matrix of channel gains. Figure 15.1-2 illustrates the discrete-time model for 
the multiple transmitter and receiver signals in each signaling interval. 

In the formulation of a MIMO system as described above, we observe that the 
transmitted symbols on the Nr transmitting antennas overlap totally in both time and 
frequency. As a consequence, there is interchannel interference in the signals {y m , 1 < 
m < Nr} received from the spatial channel. In the following subsection, we consider 
three different detectors for recovering the transmitted data symbols in a MIMO system. 


>7i 



FIGURE 15.1-2 

Discrete-time model of the communication system with multiple transmit and receive antennas 
in a frequency-nonselective slow fading channel. 
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15.1-2 Detection of Data Symbols in a MIMO System 

Based on the frequency-nonselective MIMO channel model described in Sec- 
tion 15.1-1, we consider three different detectors for recovering the transmitted data 
symbols and evaluate their performance for Rayleigh fading and additive white Gaus- 
sian noise. Throughout this development, we assume that the detector knows the ele- 
ments of the channel matrix H perfectly. In practice, the elements of H are estimated 
by using channel probe signals. 


Maximum-Likelihood Detector (MLD) The MLD is the optimum detector in the 
sense that it minimizes the probability of error. Since the additive noise terms at the Nr 
receiving antennas are statistically independent and identically distributed (iid), zero- 
mean Gaussian, the joint conditional PDF p(y|s) is Gaussian. Therefore, the MLD 
selects the symbol vector s that minimizes the Euclidean distance metric 


Nr 

M(s) = 

m— 1 


Nt 


ym 


/G 

n— 1 


(15.1-11) 


Minimum Mean-Square-Error (MMSE) Detector The MMSE detector linearly 
combines the received signals {y m , 1 < m < N R } to form an estimate of the transmitted 
symbols j s„ , 1 < n < Nt}- The linear combining is represented in matrix form as 

s = W H y (15.1-12) 

where W is an Nr x Nt weighting matrix, which is selected to minimize the mean 
square error 

J(W) = E[\\ef] = £[||s - W H y || 2 ] (15.1-13) 

Minimization of J(W) leads to the solution for the optimum weight vectors uq, uq, . . . , 

w Nt as 

w n = Ryjr Sn y, n = 1,2,..., N t (15.1-14) 

where R yy = E[yy H ] = HR ss H h + NqI is the (Nr x Nr) autocorrelation matrix of 
the received signal vector y, R ss = r Sny = E[i*y ], and E[ qq H ] = Ai () /. When 

the signal vector has uncorrelated, zero-mean components, R ss is a diagonal matrix. 
Each component of the estimate s is quantized to the closest transmitted symbol value. 


Inverse Channel Detector (ICD) The ICD also forms an estimate of .v by linearly 
combining the received signals [y m , 1 < m < Nr}. In this case, if we set N T = N R , 
the weighting matrix W is selected so that the interchannel interference is completely 
eliminated, i.e., W H = H 1 , hence 


s = H~ l y 
= s + H l q 


(15.1-15) 


Each element of the estimate s is then quantized to the closest transmitted symbol 
value. We note that the ICD estimate s is not corrupted by interchannel interference. 
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However, this also implies that the ICD does not exploit the signal diversity inherent 
in the received signal, as we will observe below. 

When N r > N r , the weighting matrix W may be selected as the pseudoinverse of 
the channel matrix, i.e., 


W H = (H h H) l H H 


Error Rate Performance of the Detectors The error rate performance of the three 
detectors in a Rayleigh fading channel is most easily assessed by computer simulation of 
the MIMO system. Figures 15.1-3 and 15.1-4 illustrate the binary error rate (BER) for 
binary PSK modulation with ( N r , N R ) = (2, 2) and ( N r , N R ) = (2, 3), respectively. In 
both cases, the variances of the channel gains are identical, and their sum is normalized 
to unity, i.e., 


Y,E[\h mn \ 2 ] =1 (15.1-16) 

n,m 

The BER for binary PSK modulation is plotted as a function of the average SNR per 
bit. With the normalization of the variances in the channel gains {h mn } as given by 
Equation 15.1-16, the average received energy is simply the transmitted signal energy 
per symbol. 



FIGURE 15.1-3 

Performance of MLD, MMSE, and inverse channel detectors with Nr = 2 receiving antennas. 
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FIGURE 15.1-4 

Performance of MLD and MMSE detectors with Nr = 3 receiving antennas. 


The performance results in Figures 15.1-3 and 15.1-4 illustrate that the MLD 
exploits the full diversity of order Nr available in the received signal, and thus its 
performance is comparable to that of a maximal ratio combiner (MRC) of the N R 
received signals, without the presence of interchannel interference, i.e., ( Nj , Nr) = 
( 1 , Nr). The two linear detectors — the MMSE detector and the ICD — achieve an error 
rate that decreases inversely as the SNR raised to the (Nr — 1) power for Nj = 
2 transmitting antennas. Thus, when Nr = 2, the two linear detectors achieve no 
diversity, and when Nr = 3, the linear detectors achieve dual diversity. We also note 
that the MMSE detector outperforms the ICD, although both achieve the same order of 
diversity. In general, with spatial multiplexing (Nt antennas transmitting independent 
data streams) , the MLD detector achieves a diversity of order Nr , and the linear detectors 
achieve a diversity of order Nr — Nr + I , for any Nr > Nj. In effect, with Nt antennas 
transmitting independent data streams and Nr receiving antennas, a linear detector has 
Nr degrees of freedom. In detecting any one data stream, in the presence of N r — I 
interfering signals from the other transmitting antennas, the linear detectors utilize 
N r — I degrees of freedom to cancel the N r — 1 interfering signals. Therefore, the 
effective order of diversity for the linear detectors is Nr — (Nt — 1) = Nr — Nt + 1 ■ 
Let us now compare the computational complexity of the three detectors. We 
observe that the complexity of the MLD grows exponentially as M Nt , where M is 
the number of points in the signal constellation, whereas the linear detectors have a 
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complexity that grows linearly with Nt and Nr. Therefore, the computational com- 
plexity of the MLD is significantly larger when M and N r are large. However, for a 
small number of transmitting antennas and signal points, say N T < 4 and M = 4, the 
computational complexity of the MLD is not excessive. 

Other Detector Structures and Algorithms 

As we have observed, the MLD is the optimum detector, hence, it minimizes the 
symbol error probability. The two linear detectors, the ICD and the MMSE detector, 
are suboptimum in terms of performance, but have low computational complexity. 
Another class of detectors is nonlinear detectors whose performance is generally better 
than that of linear detectors, but their computational complexity is greater. 

An example of a nonlinear detector is one that employs successive cancellation 
of symbols from the received signal once the symbols are detected. One method for 
accomplishing symbol cancellation is to employ the ICD or MMSE detector on the first 
pass through the data. From the linearly detected symbols, we select the symbol having 
the highest SNR, i.e., which is the most reliable. This symbol can be multiplied by the 
appropriate row of the channel matrix H and the result subtracted from the received 
signals, leaving us with a received signal containing Nr — 1 symbols. Then we repeat 
the detection procedure for the received signal containing the N T — 1 symbols. Thus, 
Nr iterations are employed to detect the N r transmitted symbols. This successive 
cancellation technique, applied to a MIMO system, is essentially a multiuser detection 
method that is further treated in Chapter 16. 

This is just one example of a nonlinear detection algorithm that may be employed 
to detect the data. Such schemes have greater computational complexity than the linear 
detectors described, but their performance is generally better. 

Another suboptimum detection method that is simpler to implement than MLD is 
sphere detection (also called sphere decoding). In sphere detection, the search for the 
most probable transmitted signal vector s is limited to a set of points Hs that lie within 
an Aft-dimensional hypersphere of fixed radius centered on the received signal vector y. 
Thus, compared with MLD in which the search for the most probable signal vector s 
encompasses all possible points Hs, sphere detection involves a search over a limited 
set of received signal points. Consequently, the computational complexity is decreased 
at a cost of an increase in the error probability. Clearly, as the radius of the sphere 
is increased, the performance of the sphere detector approaches the performance of 
the MLD. Computationally efficient algorithms for sphere detection, i.e., determining 
the signal points Hs that he inside a sphere of a given radius centered on the received 
vector y , have been published by Fincke and Pohst (1985), Viterbo and Boutros (1999), 
Damen et al. (2000), deJong and Willink (2002), and Hochwald and ten Brink (2003). 

Another nonlinear method that exploits the signal diversity inherent in the received 
signal vector y and provides near MLD performance is based on lattice reduction. For 
example, recall that if the elements of the n -dimensional signal vector s are taken from 
a square QAM signal constellation, the set of signal vectors can be viewed as a subset 
of an n-dimensional lattice. Hence, the noiseless received signal vector Hs is a subset 
of a lattice that is transformed (distorted) by the channel matrix H. The basis vectors 
for this transformed lattice are the columns of the matrix H, which, in general, are not 
orthogonal. However, the basis vectors of the transformed lattice may be orthogonalized 
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and reduced in magnitude, resulting in a new generator matrix B that is related to H 
through the transformation B = HF, where the columns of B are orthogonal and F 
is a unimodular matrix with elements having integer real and imaginary components, 
such that F satisfies the condition det( F) = ± 1 or ± j . The inverse F 1 of such a 
matrix always exists. 

We may use this basis transformation to express the received signal vector y as 

y = Hs + r] 

= (Bf _1 )s + i] 

We define the vector w as w = F~ l s, so that y may be expressed as 

y = Bw + n] 

= ( HF)w + rj 

Now, the ICD may be applied to detect the transformed signal vector w by inverting B 
and making hard decisions on the resulting elements of the vector B~ x y to yield the 
vector w. An estimate of the signal vector .v is obtained by the linear transformation s = 
Fw. This detection method has been shown to yield an order of diversity comparable 
to MLD (for reference, see Yao and Wornell (2002)). Further discussion on lattice 
reduction is given in Section 16.4-4, in the context of MIMO broadcast channels. 

Signal Detection When Channel Is Known at the Transmitter and Receiver 

The MLD, MMSE, and ICD techniques are based on knowing the channel matrix FI 
at the receiver. Another linear processing technique may be devised when the channel 
matrix H is known at the transmitter as well as the receiver. In this method, the singular 
value decomposition (SVD) of the channel matrix H, assumed to be of rank r, may be 
expressed as 

H = UZV h (15.1-17) 

where V is an Nr x r matrix, V is an N r x r matrix, and E is an r x r diagonal matrix 
with diagonal elements the singular values o\,o 2 , . . . , <r r of the channel. The column 
vectors of the matrices U and V are orthonormal. Hence U H U = I, and V n V = I r , 
where I, is the r x r identity matrix. If we process an r x 1 signal vector s at the 
transmitter by the linear transformation 


s v = Vs (15.1-18) 

then the received signal vector y is 

y = Hs v + ri = HVs + r) (15.1-19) 

At the receiver, we process the received signal vector y by the linear transformation 
U H . Thus, 


s = U H y = U H HVs + U H ij 
= U H UZV H Vs + U H r) = Es + U H r) 


(15.1-20) 
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FIGURE 15.1-5 

Signal processing and detection in a MIMO system when the channel is known at the 
transmitter and the receiver. 


Therefore, the elements of the received signal are decouptled and may be detected 
individually. The scaling of the transmitted symbols by the singular values {rr, } may be 
compensated either at the transmitter by using the linear transformation F2T 1 in place 
of V or at the receiver by the linear transformation Z 1 1/ /7 . A block diagram of the 
MIMO communication system is illustrated in Figure 15.1-5. 

From the expression for the estimate of the signal vector s given by Equa- 
tion 15.1-20 we observe that the SVD method does not exploit the signal diversity 
provided by the channel. This is the main disadvantage in decoupling the received 
signal vector y by means of the SVD. 


15.1-3 Signal Transmission Through a Slow Fading Frequency-Selective 
MIMO Channel 

In this section we consider transmission through a frequency-selective MIMO channel 
in which the time variations of the impulse responses {/i,y (r ; t)\ are very slow compared 
to the symbol rate 1 / T . According to Equations 15.1-2 and 15.1-3, the signal received 
from the frequency-selective MIMO channel may be expressed as 

Nt OO 

r,(t) = hij(x\ t)sj(t - r)dr + z,-(0> i = l,2,...,N R (15.1-21) 

7=1 J ~°° 

where Zi(t) represents the additive noise at the ith receive antenna. Let the signal 
transmitted in the /7th signal interval be sj(t) = Sj(n)g(t — nT ), where g(t) is the 
impulse response of the modulation filters and {Sj(n)} is the set of Nj information 
symbols. After substituting for sj(t) in Equation 15.1-21, we obtain 

n t roo 

n(t) = EE sj(n) / hjj( r; t)g(t - nT - r)dr + Zi(t), i = 1,2,..., Nr 

n j = 1 

(15.1-22) 

It is convenient to process the received signal in sampled form. Consequently, we 
may sample the received signal r, (?) at some suitable sampling rate F s = J/T, where 
J is a positive integer. For example, we may select J = 2, so that there are two samples 
per symbol. Such a sampling rate is appropriate when the impulse response g(t) of the 
modulation filters is band-limited to \ f\ < l/T . 
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At each antenna, the received signal is passed through a bank of Nt finite-duration 
impulse response (FIR) filters, where each filter spans K samples. The filter coefficients 
at time instant n are denoted as {a !; (&; n), k = 0, 1, . . . , K) and are assumed to be 
complex- valued in general. Suppose that these FIR filters function as linear equalizers. 
Then the outputs of the FIR filters from the Nr receive antennas may be used to 
form estimates of the transmitted information symbols. Thus, the estimate of the y'th 
information symbol transmitted at time instant n may be expressed as 


Nr 

¥«) = H 


1=1 


~K - 1 

Y a ij(k: n)r i(n 
_k = o 



7 = 1,2,..., N t (15.1-23) 


where Sj(n) denotes the estimate of Sj(n). 

The estimates given by Equation 15.1-23 can be expressed more compactly in 
matrix form as 


s(n) = A H (n)r(n) 

where the matrix A(/z) and the vector r(n) are defined as 


A(n) = 


~a* n (n) 

a* 2 (n) 

■ a *N T M 

a* 2 i(n) 

«22(”) 

a 2N T ( n ) 

_<wi(«) 

a V K 2(”) ■ 

a *N R N T ^- 


r(n) = 


>t(«) 

ri(n) 

_ r N R {n) _ 


(15.1-24) 


(15.1-25) 


where (a, ; (n)} and {r j(n)} arecolumn vectorsof dimension K and A H (n) = \A(n)\ H = 
[a*j(n)] H = [a'jjin)]. Figure 15.1-6 illustrates the structure of the demodulator for 
Nr = 2 transmitting antennas and N R = 3 receiving antennas. 

The estimate s(n) is fed to the detector which compares each element of s(n) 
with the possible transmitted symbols and selects the symbol sj(n) that is closest in 
Euclidean distance to Sj(n). 

When the channel impulse responses (t; / ) } change slowly with time, the 
coefficients of the FIR equalizers can be adjusted adaptively to minimize the mean 
square error (MSE) between the desired data symbols \s j(n), j = 1,2,..., Nj } and the 
estimates {sj(n), j = 1,2, , Nt}- Initial adjustment of the coefficients {«,,(«)} may 
be accomplished by transmitting a finite-duration sequence of training symbol vectors 
from the Nt transmit antennas. In the training mode, the error signal is formed as 


e(n) = s(n) — s(n) 

= s(n ) - A H (n)r{n) 


(15.1-26) 


or, equivalently, as 


ej(n ) = sj(n) - sj{n). 


j = 1,2,..., N t 


(15.1-27) 
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ill") 


s 2 (n) 


FIGURE 15.1-6 

Signal demodulation with linear equalizers for the frequency-selective channel. 

and the equalizer coefficients are adjusted to minimize 

MSE; = E [\ej(n)\ 2 ] , j = 1, 2, . . . , N T (15.1-28) 

Either the LMS algorithm or the RLS algorithm described in Sections 10.1 and 
10.4 may be used to adjust the equalizer coefficients. Following the training symbols, 
in the data transmission mode, the detector outputs may be used in place of the training 
symbols to form the error signal, i.e., 


ej(n) = s j{n) - sj{n ), j = \,2,...,N T (15.1-29) 

where sj(n) is the output of the detector for the / th symbol at time n, which is the 
symbol nearest in distance to the estimate Sj(n). 

example 15 . 1 - 1 . Consider a MIMO system in which the channel impulse responses 
are 

MG 0 = h^Sir) + hV8(T -T), i = 1, 2, . . . , N r 

j = 1,2,..., N t 

where T is the symbol interval. In this case, the channel is time dispersive with inter- 
symbol interference occurring over two successive symbols. The channel coefficients 
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{/tjj 1 } and are assumed to be fixed overatime interval spanning 2000 symbols, 

and are zero-mean complex-valued Gaussian random variables with variances 


°-y(*) = E 



k= 1,2 


The sum of all these variances is normalized to unity, i.e., 


2 Nt Nr 

EEE'Jw - 1 

*=i 7=i ;=i 

A Monte Carlo simulation of the performance of the linear equalizers for the case 
in which the two multipath components have equal variance and the modulation is 
binary PSK is shown in Figure 15.1-7 for (Nt, Nr) = (1, 1), (2, 2), and (2, 3). The 
linear equalizers were trained initially with the LMS algorithm for 1000 symbols. The 
simulations were performed for 1000 different channel realizations. The maximum 
achievable diversity is 2 Nr, where the factor of 2 is due to the multipath. 

We observe that the effect of the ISI in the performance of the MIMO system is 
very severe. There is a significant loss in the performance of the (2, 2) and (2, 3) MIMO 



FIGURE 15.1-7 

Performance of linear equalizer for two-path channel with ( N T , N R ) antennas for spatial 
multiplexing. 
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systems due to the ISI. This effect is due to the basic limitation of linear equalizers to 
mitigate ISI in fading multipath channels. 

Other Equalizer Structures 

The linear adaptive equalizer described above for the MIMO channel is the simplest 
equalization technique from the viewpoint of computational complexity. To achieve bet- 
ter performance, one may employ a more powerful equalizer, in particular, a decision- 
feedback equalizer (DFE) or a maximum-likelihood sequence detector (MLSD). 

Figure 15.1-8 illustrates the structure of a DFE for a MIMO channel with Nj = 
Nr = 2 antennas. The two feedforward biters at each receive antenna are structurally 
identical to the FIR biters in a linear equalizer structure. Typically, these FIR biters 
have fractionally spaced taps. The two feedback biters connected to each detector 
are symbol-spaced FIR biters. Their function is to suppress the ISI that is inherent 
in previously detected symbols (so-called postcursors). Thus, the estimate of the / th 
information symbol transmitted at time instant n may be expressed as 

Nr ( 0 K 2 'I 

Sj(n) = ^2) a 'i( k ’ n ) r i( n - k) - ^2 b ij( k ’ n)Si(n - k) > (15.1-30) 

1 = 1 k=\ J 

where K\ + 1 is the number of tap coefficients in each of the feedforward biters and 
AT 2 is the number of tap coefficients {b,j(k: n ) } in each of the feedback biters. 



FIGURE 15.1-8 

Signal demodulation with decision-feedback equalizers for the frequency-selective channel. 
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As in the case of the linear equalizers for the MIMO channel, the MSE criterion 
may be used to adjust the coefficients of the feedforward and feedback filters. Training 
symbols are usually needed to adjust the equalizer coefficients initially. When data 
are transmitted in frames, training symbols may be inserted in each frame for initial 
adjustment of the DFE coefficients. During the transmission of information symbols, 
the symbols at the output of the detector may be used for coefficient adjustment. We 
note that the computational complexity of the DFE is comparable to that of the linear 
MIMO equalizer. 

example 15 . 1 - 2 . Consider the MIMO system described in Example 15.1—1, where 
the linear equalizers are replaced by decision-feedback equalizers. The error rate perfor- 
mance of the MIMO system with DFEs, obtained by Monte Carlo simulation, is shown 
in Figure 15.1-9. In comparing the performance of the MIMO system with DFEs and 
with linear equalizers, we observe that the DFEs generally yield better performance. 
Nevertheless, there is still a significant loss in performance due to ISI. 

The best performance in the presence of ISI is obtained when the equalization algo- 
rithm is based on the MFSD criterion. A multichannel version of the Viterbi algorithm 



FIGURE 15.1-9 

Performance of DFEs for two-path channel with ( N T , Nr) antennas for spatial multiplexing. 
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is computationally efficient in implementing MLSD for a MIMO channel with ISI. The 
major impediment in the implementation of the Viterbi algorithm is its computational 
complexity, which grows exponentially as M L , where M is the size of the symbol con- 
stellation and L is the span of the channel multipath dispersion expressed in terms of 
the number of information symbols spanned. Consequently, except for channels with 
relatively small multipath spread, e.g., L = 2 or 3, and small signal constellations, 
e.g., M = 2 or 4, the implementation complexity of the Viterbi algorithm for a MIMO 
system is very high compared to that for a DFE. 


■ 15.2 

CAPACITY OF MIMO CHANNELS 

In this section, we evaluate the capacity of MIMO channel models. For mathematical 
convenience, we limit our treatment to frequency-nonselective channels which are 
assumed to be known to the receiver. Thus, the channel is characterized by an Nr x N t 
channel matrix H with elements In any signal interval, the elements { h , s } are 
complex-valued random variables. In the special case of a Rayleigh fading channel, 
the {hjj} are zero-mean complex-valued Gaussian random variables with uncorrelated 
real and imaginary components (circularly symmetric). When the { /? } are statistically 
independent and identically distributed complex-valued Gaussian random variables, 
the MIMO channel is spatially white. 


15.2-1 Mathematical Preliminaries 

By using a singular value decomposition (SVD), the channel matrix H with rank r may 
be expressed as 

H = UZV H (15.2-1) 

where U is an Nr x r matrix, V is an N r x r matrix, and T is an r x r diagonal matrix 
with diagonal elements the singular values cti, < 72 , . . . , oy of the channel. The singular 
values {<Tj} are strictly positive and are ordered in decreasing order, i.e., cr, > a, + 1 . 
The column vectors of U and V are orthonormal. Hence U H U = I, and V n V = I r , 
where I, is an r x r identity matrix. Therefore, the SVD of the channel matrix H may 
be expressed as 

r 

H = y (TjUjvf (15.2-2) 

;=i 

where {«,■} are the column vectors of U, which are called the left singular vectors of H . 
and {Vj } are the column vectors of V, which are called the right singular vectors of H. 

We also consider the decomposition of the Nr x Nr square matrix HH h . This 
matrix may be decomposed as 


HH h = qaq h 


(15.2-3) 
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where Q is the Nr x Nr modal matrix with orthonormal column vectors (eigenvectors), 
i-e., Q H Q — / \ R , and A is an Nr x Nr diagonal matrix with diagonal elements 
{/., , i = 1,2,..., Nr}, which are the eigenvalues of H H H . With the eigenvalues 
numbered in decreasing order (A, > A, + i), it can be easily demonstrated that the 
eigenvalues of HH h are related to the singular values in the SVD of H as follows: 


f of i = 1, 2, . . . , r 

\ 0 i = r + l,...,N s 


(15.2-4) 


A useful metric is the Frobenius norm of H, which is defined as 


l|ff IIf 


\ 


Nr Nt 


' = 1 i = ! 


= \/trace (H H ") 


(15.2-5) 


Nr 


N 




We shall observe below that the squared Frobenius norm \\H\\ 2 F is a parameter that de- 
termines the performance of MIMO communication systems. The statistical properties 
of \\H\\ 2 f can be determined for various fading channel conditions. For example, in the 
case of Rayleigh fading, | /r ;/ - [ 2 is a chi-squared random variable with two degrees of 
freedom. When the {hjj} are iid (spatially white MIMO channel) with unit variance, the 
probability density function of \\H\\ 2 F is chi-squared with 2N rN t degrees of freedom; 
i.e., if X =||.ff ||| , 


p(x) = 


(n-\)\ e 


x > 0 


(15.2-6) 


where ti = NrN t . 


15.2-2 Capacity of a Frequency-Nonselective Deterministic MIMO Channel 

Let us consider a frequency-nonselective AWGN MIMO channel characterized by 
the matrix H. Let s denote the N F x I transmitted signal vector, which is statistically 
stationary and has zero mean and autocovariance matrix R ss . In the presence of AWGN, 
the Nr x 1 received signal vector y may be expressed as 

y = Hs + ri (15.2-7) 

where r] is the Nr x 1 zero-mean Gaussian noise vector with covariance matrix R nn = 
NqI n r . Although H is a realization of a random matrix, in this section we treat H as 
deterministic and known to the receiver. 
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To determine the capacity of the MIMO channel, we first compute the mutual 
information between the transmitted signal vector s and the received vector y. denoted 
as 7(s; y), and then determine the probability distribution of the signal vector s that 
maximizes /(s; y ). Thus, 


C = max I(s; y) (15.2-8) 

PM 

where C is the channel capacity in bits per second per hertz (bps/Hz). It can be shown 
(see Telatar (1999) and Neeser and Massey (1993)) that I(s\ y) is maximized when 
s is a zero-mean, circularly symmetric, complex Gaussian vector; hence, C is only 
dependent on the covariance of the signal vector. The resulting capacity of the MIMO 
channel is 

C= max log 2 dct ( I n k H HR ss H h \ bps/Hz (15.2-9) 

A, ) = A y A/o ) 

where tr (R ss ) denotes the trace of the signal covariance R ss . This is the maximum rate 
per hertz that can be transmitted reliably (without errors) over the MIMO channel for 
any given realization of the channel matrix H. 

In the important practical case where the signals among the N T transmitters are 
statistically independent symbols with energy per symbol equal to 8 S /N T , the signal 
covariance matrix is diagonal, i.e., 

Rss = 1 Nr (15.2-10) 

n t 

and trace (R ss ) = S s . In this case, the expression for the capacity of the MIMO channel 
simplifies to 

C = log^det (i Nr + bps/Hz (15.2-11) 

V NjN o / 

The capacity formula in Equation 15.2-1 1 can also be expressed in terms of the 
eigenvalues of HH h by using the decomposition HH h = QAQ H . Thus, 

C = log 2 det (/„, + QAQ H ^j 
= io g2 d + 

(15.2-12) 

= log 2 d e, (/„, + 


= 5Il0g2 

1 = 1 


1 + 


£s . 
Nt Nq 


where r is the rank of the channel matrix H. 
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It is interesting to note that in a SISO channel, A] = |/tn| 2 so that 

Csiso = log 2 (l + ^|/!ni 2 ) bps/Hz (15.2-13) 

We observe that the capacity of the MIMO channel is simply equal to the sum of the 
capacities of r SISO channels, where the transmit energy per SISO channel is £ s /Nr 
and the corresponding channel gain is equal to the eigenvalue A,- . 

Capacity of SIMO Channel 

A SIMO channel (Nr = 1, Nr > 2) is characterized by the vector h = [hn hi\ ■ ■ ■ 
h NR i]'. In this case, the rank of the channel matrix is unity, and the eigenvalue A] is 
given as 

Nr 

h =\\hf F = 5>tnl 2 (15.2-14) 

i=i 

Therefore, the capacity of the SIMO channel, when the N R elements {/i, i } of the channel 
are deterministic and known to the receiver, is 

Csimo = log 2 ^1 + — II h || ^ 

v (15.2-15) 

|/t,'i| 2 j bps/Hz 


= l«g 2 ^1 


C Nr 
C'S 

* h 


Capacity of MISO Channel 


A MISO channel (Nr > 2, Nr = 1) is characterized by the vector h = [h u h i 2 . . . 
h\ Nr Y . In this case, the rank of the channel matrix is also unity, and the eigenvalue A! 
is given as 


A-i =\\hf F = J2\hij\ 2 (15.2-16) 

j = i 

The resulting capacity of the MISO channel when the Nj elements { h \ ; } of the channel 
are deterministic and known to the receiver is 

( £, 

Cmiso — l°g 2 I 1 + 


= ^go 1 + 


Nt No 


\h\\ 2 F 


Nf 


(15.2-17) 


NrNq 


E M 


bps/Hz 


It is interesting to note that for the same || h || \ , the capacity of the SIMO channel is 
greater than the capacity of the MISO channel when the channel is known to the receiver 
only. The reason is that, under the constraint that the total transmitted energy in the 
two systems be identical, the energy £ s in the MISO system is split evenly among the 
N t transmit antennas, whereas in the SIMO system, the transmitter energy £ s is used 
by the single antenna. Note also that in both SIMO and MISO channels, the capacity 
grows logarithmically as a function of || It |y. 
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15.2-3 Capacity of a Frequency-Nonselective Ergodic Random 
MIMO Channel 


The channel capacity expressions derived in Section 15.2-2 for a deterministic MIMO 
channel may be viewed as the capacity for a randomly selected realization of the channel 
matrix. To determine the ergodic capacity, we may simply average the expression for 
the capacity of the deterministic channel over the statistics of the channel matrix. Thus, 
for a SIMO channel, the ergodic capacity, as defined in Chapter 14, is 


Csimo = E 


lo §2 M + TtE^' 


V N °U 


£.s 


log-) 1 H — -x p(x)dx bps/Hz 
No 


(15.2-18) 


where X = |fi, 1 1 2 and p(x) is the probability density function of the random 
variable X. 

Figure 15.2-1 illustrates Csimo versus the average SNR £ s E(\hn\ 2 )/No for Nr = 
2,4, and 8 when the channel parameters {hn} are iid complex- valued, zero-mean, 
circularly symmetric Gaussian with each having unit variance. Hence, the random 



FIGURE 15.2-1 

Ergodic capacity of SIMO channels. 
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variable X has a chi-squared distribution with 2 Nr degrees of freedom, and its PDF is 
given by Equation 15.2-6. For comparison, the ergodic capacity Csiso is also shown. 
Similarly, the ergodic channel capacity for the MISO channel is 


Cmiso = E 


log 2 1 


Ss 


Nt 


N t N 0 




Xim 2 


log, 1 + ‘ X p(x) dx bps/Hz 

NtNq 


(15.2-19) 


Figure 15.2-2 illustrates Cmiso versus the average SNR, as defined above, for 
Nt = 2,4, and 8 when the channel parameters {h\j} are iid zero-mean, complex- 
valued, circularly symmetric Gaussian, each having unit variance. As in the case of 
the SIMO channel, the random variable x has a chi-squared distribution with 2 Nr 
degrees of freedom. The ergodic capacity of a SISO channel is also included in Fig- 
ure 15.2-2 for comparison purposes. In comparing the graphs in Figure 15.2-1 with 
those in Figure 15.2-2, we observe that Csimo > Cmiso- 

To determine the ergodic capacity of the MIMO channel, we average the expression 
for C given in Equation 15.2-12 over the joint probability density function of the 



FIGURE 15.2-2 

Ergodic capacity of MISO channels. 
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eigenvalues {A,}. Thus, 


Cmimo = E 1 J 2 lo & ( 1 + 


. ;= l 


Nj No 


l0 S2 ( 1 + 


li=l 


£ s . 
N -j- N{) 


p(X x , . . . , X r )dX i 


• • dX r 
(15.2-20) 


For the case in which the elements of the channel matrix H are complex-valued 
zero-mean Gaussian with unit variance and spatially white with Nr = N T = N, the 
joint PDF of {A, } is given by Edelman (1989) as 


N 

[](2A ( -2A ; ) 2 [] M (A,) 

ij i= l 

i<j 

(15.2-21) 

where T ;V ( N ) is the multivariate gamma function defined as 


p( At, A 2, . . . , A N ) = 


(jr /2)JV(JV- 1 ) 
[En(N )] 2 


exp 


- 


,/=i 


N 

r n(N) = n N{N - 1)/2 [|(A - 0! (15.2-22) 

!=1 


Figure 15.2-3 illustrates Cmimo versus the average SNR for N T = N R = 2 and 
N t = N r = 4. The ergodic capacity of a SISO channel is also included in Fig- 
ure 15.2-3 for comparison purposes. We observe that at high SNRs, the capacity of 
the (Nr, Nr) = (4, 4) MIMO system is approximately four times the capacity of the 
(1,1) system. Thus, at high SNRs, the capacity increases linearly with the number of 
antenna pairs when the channel is spatially white. 


15.2-4 Outage Capacity 

As we have observed, the capacity of a randomly fading channel is a random variable. 
For an ergodic channel, its average value C is the ergodic capacity. For a nonergodic 
channel, a useful performance metric is the probability that the capacity is below some 
value for a specified percentage of channel realizations. This performance metric is the 
outage capacity, defined in Section 14.2-2. 

To be specific, we consider a channel that is known to the receiver only. We assume 
that the MIMO channel matrix H is randomly selected in accordance with each channel 
realization and remains constant for each channel use. In other words, we assume that 
the channel is quasi-static for the duration of a frame of data, but the channel matrix 
may change from frame to frame. Then, for any given frame, the probability 

P(C < C p ) = Pout (15.2-23) 

is called the outage probability and the corresponding capacity C p is called the 100 P 0 ut% 
outage capacity where the subscript p denotes P out . Hence, the achievable information 
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FIGURE 15.2-3 

Ergodic capacity of MIMO channels. 


rate will exceed C p for 100(1 — P 0 ut)% of the MIMO channel realizations. Equivalently, 
if we transmit a large number of frames, the transmission of a frame will fail (contain 
errors) with probability P out . 

To evaluate the outage capacity of a MIMO channel, let us consider a channel matrix 
H, whose elements are iid, complex-valued, circularly symmetric, zero-mean Gaussian 
with unit variance. Then, for each realization of H , say H /., the corresponding capacity 
Ck is given by Equation 15.2-1 1 for any SNR £ s /Nq. If we consider the ensemble of all 
possible channel realizations for any given SNR, the PDF of C k may appear as shown 
in Figure 15.2-4. 

The cumulative distribution function (CDF) is 

F(C) = P(C k < C) 

Figure 15.2-5 illustrates the CDF for Nr = Nr = 2 and Nr = Nr = 4 MIMO 
channels and a SISO channel for an SNR of 10 dB. The outage capacity at some 
specified outage probability is easily determined from F(C) for any given SNR. 

Figure 15.2-6 illustrates the 10% outage capacity as a function of the SNR for 
Nr = Nr = 2 and Nr = Nr = 4 MIMO channels and for a SISO channel. We 
observe that, as in the case of the ergodic capacity, the outage capacity increases as the 
SNR is increased and as the number of antennas Nr = Nr increases. 
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C k Capacity (bps/Hz) 


FIGURE 15.2-4 

Probability density function of channel capacity for an N T = N R = 2 MIMO channel at 
SNR = 10 dB. 



FIGURE 15.2-5 

CDF of MIMO channel capacity at SNR = 10 dB. 
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FIGURE 15.2-6 

10% Outage capacity of MIMO channels. 


15.2-5 Capacity of MIMO Channel When the Channel Is Known 
at the Transmitter 

We have observed that when the channel matrix H is known only at the receiver, the 
transmitter allocates equal power to the signals transmitted on the multiple transmit 
antennas. On the other hand, if both the transmitter and the receiver know the channel 
matrix, the transmitter can allocate its transmitted power more efficiently and thus 
achieve a higher capacity. 

Let us consider a MIMO system with Nt transmit antennas and Nr receive antennas 
in a frequency-nonselective channel. The channel matrix H is assumed to be of rank 
r. Hence, using an SVD, H is represented as H = U TV" . Since H is known at 
the transmitter and the receiver, the transmitted signal vector of dimension r x 1 is 
premultiplied by the matrix V, and the received signal is premultiplied by the matrix 
U H as previously described in Section 15.1-2 and in Figure 15.1-5. The transmitted 
signal vector s has zero-mean, complex- valued Gaussian elements. The sum of the 
variances of the elements of s is constrained to be equal to N r , i.e., 

£( S ff S ) = ^£[k| 2 ] =j2< = N T 

k= 1 k= 1 

Hence, the signal transmitted on the Nt antennas is y 7 £ s /NtVs . 


(15.2-24) 
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The received signal vector is 


y = 



HVs + r\y 



U Zs + rj 


(15.2-25) 


After premultiplying y by U H , we obtain the transformed r x 1 vector 


y' = U H y = 



(15.2-26) 


where )/' = U H i). 

We observe that the channel characterized by the Nr x Nt channel matrix is 
equivalent to r decoupled SISO channels, whose output is 


y ' k = f-^ Sk + r] ' k ' k = l ' 2 ' (15.2-27) 

Therefore, the capacity of the MIMO channel for a specific power allocation at the 
transmitter is 


C (K}) = X>fc (l + (15.2-28) 

Note that the energy transmitted per symbol on the klU subchannel is £ s o l s /Nj- 
The transmitter allocates its total transmitted power across the Nt antennas so as to 
maximize C ({er^}). Thus, the capacity of the MIMO channel under the optimum 
power allocation is 


c= ”“S log2 ( 1+ l^ 4 ) <15 - 2 - 29) 

where the constraint on the { } is given by Equation 15.2-24. The maximization 
in Equation 15.2-29 can be performed by numerical methods. Basically, the solution 
satisfies the “water-filling principle,” which allocates more power to subchannels which 
have low noise power, i.e., according to the ratio /V (l //.;., and less power to subchannels 
that have high noise power. 

For an ergodic channel, the average (ergodic) capacity, is determined by averaging 
the capacity given in Equation 15.2-29 for a given H over the channel statistics, i.e., 
over the joint PDF of { /,/, } . Thus, 

c=£ lss ios 2 ( i+ H 4 )} <i5 - 2 - 3o) 


This computation can be performed numerically when the joint PDF of the eigenvalues 
{A.*} is known. 
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■ 15.3 

SPREAD SPECTRUM SIGNALS AND MULTICODE TRANSMISSION 


In Section 15.1 we demonstrated that a MIMO system transmitting in a frequency- 
nonselective fading channel can employ identical narrowband signals for data trans- 
mission. The signals from the /Vy transmit antennas were assumed to arrive at the 
Nr receive antennas via N r N R independently fading propagation paths. By knowing 
the channel matrix H, the receiver is able to separate and detect the Nt transmitted 
symbols in each signaling interval. Thus, the use of narrowband signals provided a 
data rate increase (spatial multiplexing gain) of Nt relative to a single- antenna sys- 
tem and, simultaneously, a signal diversity of order Nr, where Nr > N r , when the 
maximum-likelihood detector is employed. 

In this section we consider a similar MIMO system with the exception that the 
transmitted signals on the Nj transmit antennas will be wideband, i.e., spread spectrum 
signals. 


15.3-1 Orthogonal Spreading Sequences 

The MIMO system under consideration is illustrated in Figure 15.3-l(a). The data 
symbols j ,v ; ,1 < j < Nj) arc each multiplied (spread) by a binary sequence {('jk, 1 < 
k < L c , 1 < j < Nt} consisting of L c bits, where each bit takes a value of either +1 
or — 1. These binary sequences are assumed to be orthogonal, i.e., 


For example, the orthogonal sequences may be generated from Nj Hadamard code- 
words of block length L c , where a 0 in the Hadamard codeword is mapped into a — 1 
and a 1 is mapped into a +1. The resulting orthogonal sequences are usually called 
Walsh-Hadamard sequences. 

The transmitted signal on the yth transmit antenna may be expressed as 


where £ s /N t is the energy per transmitted symbol, T is the symbol duration, T c = 
T /L c , and g{t) is a signal pulse of duration T c and energy 1 /L c . The pulse g(t ) is 
usually called a chip, and L c is the number of chips per information symbol. Thus, the 
bandwidth of the information symbols, which is approximately 1/7’, is expanded by 
the factor L c , so that the transmitted signal on each antenna occupies a bandwidth of 
approximately 1/7/. 

The MIMO channel is assumed to be frequency-nonselective and characterized by 
the matrix H, which is known to the receiver. At each receiving terminal, the received 
signal is passed through a chip matched filter and matched to the chip pulse g(t), and 


L c 

'y c jkCjk — i j / i 


(15.3-1) 


Sj(t) = 



J2c jk g(t-kT c ), 0< t < T- j = 1,2,..., N t (15.3-2) 


k= 1 
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FIGURE 15.3-1 

MIMO system with spread spectrum signals. 
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its sampled output is fed to a bank of Nt correlators whose outputs are sampled at the 
end of each signaling interval, as illustrated in Figure 15.3— 1(b). Since the spreading 
sequences are orthogonal, the N T correlator outputs at the mth receive antenna are 
simply expressed as 


ymj — 



T Vjm j > 


m — 1,2,..., N r ; j = \,2,...,N T (15.3-3) 


where {q m j } denote the additive noise components, which are assumed to be zero mean, 
complex- valued circularly symmetric Gaussian iid with variance E [|^ m; | 2 ] = a 1 . 

It is convenient to express the Nr correlator outputs corresponding to the same 
transmitted symbol Sj in vector form as 


y j = 



(15.3-4) 


where y, = [ yij y 2j ■ ■ ■ y Ntj V , h, = [h Xj h 2j ■ ■ ■ h Nt jV, and r] , = [ mj q 2j ■ ■ ■ q NRj Y . 
The optimum combiner is a maximal ratio combiner (MRC) for each of the transmitted 
symbols j.v ; }. Thus, the output of the MRC for the /'th signal is 

M; = hfyj 



+ h''rij, 


7 = 1 . 2 , 


N t 


(15.3-5) 


The decision metrics {/x 7 } are the inputs to the detector, which makes an independent 
decision on each symbol in the set {sj} of transmitted symbols. 

We observe that the use of orthogonal spreading sequences in a MIMO system 
transmitting over a frequency-nonselective channel significantly simplifies the detector 
and, for a spatially white channel, yields Nr -order diversity for each of the transmitted 
symbols {.v ; The evaluation of the error rate performance of the detector for standard 
signal constellations such as PSK and QAM is relatively straightforward. 


Frequency-Selective Channel If the channel is frequency-selective, the orthogo- 
nality property of the spreading sequences no longer holds at the receiver. That is, the 
channel multipath results in multiple received signal components which are offset in 
time. Consequently, the correlator outputs at each of the antennas contain the desired 
symbol plus the other Nr — 1 transmitted symbols, each scaled by the correspond- 
ing cross-correlations between pairs of sequences. Due to the presence of intersymbol 
interference, the MRC is no longer optimum. Instead, the optimum detector is a joint 
maximum-likelihood detector for the N- r transmitted symbols received at the Nr receive 
antennas. 

In general, the implementation complexity of the optimum detector in a frequency- 
selective channel is extremely high. In such channels, a suboptimum receiver may be 
employed. A receiver structure that is readily implemented in a MIMO frequency- 
selective channel employs adaptive equalizers at each of the Nr receivers prior to 
despreading the spread spectrum signals. Figure 15.3-2 illustrates the basic receiver 


Chapter Fifteen: Multiple- Antenna Systems 


995 



Sample 
at the 
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FIGURE 15.3-2 

A MIMO receiver structure for a frequency-selective channel. 
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structure. The received signal at each receive antenna is sampled at some multiple 
of the chip rate and fed to a parallel bank of Nj fractionally spaced linear equaliz- 
ers, whose outputs are sampled at the chip rate. After combining the respective N R 
equalizer outputs, the N T signals are despread and fed to the detector, as illustrated in 
Figure 15.3-2. Alternatively DFEs may be used, where the feedback biters are operated 
at the symbol rate. 

Training signals for the equalizers may be provided to the receiver by transmitting 
a pilot signal from each transmit antenna. These pilot signals may be spread spec- 
trum signals that are simultaneously transmitted along with the information-bearing 
signals. Using the pilot signals, the equalizer coefficients can be adjusted recursively 
by employing a LMS- or RLS-type algorithm. 


15.3-2 Multiplexing Gain Versus Diversity Gain 

As we have observed from our previous discussion, the use of orthogonal spreading 
sequences to transmit multiple data symbols makes it possible for the receiver to separate 
the data symbols by correlating the received signal with each of the spreading sequences. 
For example, let us consider the MISO system shown in Figure 15.3-3, which has 
Nt transmit antennas and one receive antenna. As shown, Nj different symbols are 
transmitted simultaneously on the Nt transmit antennas. The receiver employs a parallel 



FIGURE 15.3-3 

MISO system with spread spectrum signals. 
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bank of Nt correlators. Thus, the output of the / tli correlator is 

yj = \ ^ s j h j + r ]j , j = 1,2, N t (15.3-6) 

V Nt 

where hj is the complex- valued channel parameter associated with the propagation 
of the j th transmitted signal. Hence, the detector computes the decision variables 
{ y'j h*j, j = 1,2,..., Nj} and makes an independent decision on each transmitted 
symbol. In this configuration, the MISO system achieves a multiplexing gain (increase in 
data rate) of Nt, but there is no diversity gain. Alternatively, if two or more transmitting 
antennas transmit the same information symbol, the receiver can employ a maximal 
ratio combiner to combine the received signals carrying the same information and, thus, 
achieve an order of diversity of 2 or more at the expense of reducing the multiplexing 
gain. If all Nt transmit antennas are used to transmit the same information symbol, 
the receiver can achieve Nt -order diversity, but there would be no multiplexing gain. 
Thus, we observe that there is a tradeoff between muliplexing gain and diversity gain. 

More generally, in a MIMO system with Nt transmit antennas and N R receive 
antennas, the multiplexing gain can vary from 1 to Nt and the diversity gain can 
vary from N r Nt to N R , respectively. Thus, an increase in diversity gain is offset 
by a corresponding decrease in multiplexing gain and vice versa. Although we have 
described this tradeoff between multiplexing gain and diversity gain in the context 
of orthogonal spreading sequences, this tradeoff is also appropriate in the context of 
narrowband signals. 


15.3-3 Multicode MIMO Systems 

In Sections 15.3-1 and 15.3-2, we considered spread spectrum MIMO systems in 
which a single sequence was used at each transmitting antenna to spread a single 
information symbol. However, it is possible to employ multiple orthogonal sequences 
at each transmitting antenna, to transmit multiple information symbols and thus to 
increase the data rate. 

Figure 15.3-4 illustrates this concept with the use of two transmit and two receive 
antennas (N R = N r = 2). There are K orthogonal spreading sequences that are used 
to spread the spectrum of K information symbols at each transmitter. The same K 
spreading sequences are used at all the transmitters. Thus, with Nt transmit antennas 
there are K Nt information symbols that are transmitted simultaneously. At each trans- 
mitter, the sum of K spread signals is multiplied by a pseudorandom sequence pj, 
called a scrambling sequence, consisting of statistically independent, equally probable 
T Is and —Is occurring at the chip rate of the orthogonal sequences { cy } . The scram- 
bling sequences used at the N T different transmitters are assumed to be statistically 
independent. These scrambling sequences serve as a means to separate (orthogonalize) 
the transmissions among the Nt transmit antennas, and have a length L s , which may 
be equal to or larger than the length L c of the orthogonal sequences, where L c is the 
number of chips per information symbol. The scrambled orthogonal signals at each 


FIGURE 15.3 — 4 

Modulator and demodulator for a multicode MIMO system. 
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antenna may be expressed as 



(15.3-7) 


where p , is the scrambling sequence at the jth transmitter, Sj = [sji sj 2 ■ ■ ■ SjkY is the 
vector of information symbols transmitted from the yth antenna, Ck = [q i Ck 2 • ■ ■ Qz., ] 
is the At It orthogonal spreading sequence, g(t) is the chip signal pulse of duration T, and 
energy 1 /L c , and £ s / K N r is the average energy per transmitted information symbol 
at each antenna. 

At each receive antenna, the received signals are passed through a chip matched 
filter and sampled at the chip rate. The samples at the output of the chip matched 
filters are descrambled and cross-correlated with each of the K orthogonal sequences. 
The correlator outputs are sampled at the symbol rate. Assuming that the scrambling 
sequences are orthogonal, these samples may be expressed as 


where y jk — \_yijk y'ljk * * * yN/tjk] - h / — [ h ] t /v 2 / ■ ■ ■ // \ R i ] , and i] jk — [ A 1 j k Hijk 
il N R jkY is the additive Gaussian noise vector. Thus, the transmitted symbols are decou- 
pled by use of orthogonal scrambling and spreading sequences. These samples are fed 
to the maximal ratio combiner which computes the metrics 


These metrics are passed to the detector which makes a decision on each of the trans- 
mitted information symbols based on a Euclidean distance criterion. We should note 
that if the scrambling sequences are not orthogonal, we have intersymbol interference 
among the symbols transmitted on the N r antennas. In such a case, a multisymbol (or 
multiuser) detector must be employed. 

In a frequency-selective channel, the orthogonality among the multiple codes is de- 
stroyed. In such channels, a practical implementation of the receiver employs adaptive 
equalizers to restore the orthogonality of the codes and mitigates the effects of inter- 
chip and intersymbol interference. Figure 15.3-5 illustrates such a receiver structure. 
Training signals for the equalizers are usually provided to the receiver by transmitting 
a pilot signal from each transmit antenna. These pilot signals may be spread spectrum 
signals that are simultaneously transmitted along with the information-bearing signals. 
For example, the pilot signals may be transmitted with the spreading code c\ at each 
transmit antenna. Using the pilot signals, the equalizer coefficients can be adjusted 
recursively by employing either an LMS or RLS type of algorithm. 




Vjk = hf yjk 



j = 1,2,..., Nr, k=l,2,...,K 

(15.3-9) 
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FIGURE 15.3-5 

Receiver structure for a MIMO multicode system in a frequency-selective MIMO channel. 
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■ 15.4 

CODING FOR MIMO CHANNELS 

In this section we describe two different approaches to code design for MIMO channels 
and evaluate their performance for frequency-nonselective Rayleigh fading channels. 
The first approach is based on using conventional block or convolutional codes with 
interleaving to achieve signal diversity. The second approach is based on code design 
that is tailored for multiple-antenna systems. The resulting codes are called space-time 
codes. We begin by recapping the error rate performance of coded SISO systems in 
Rayleigh fading channels. 


15.4-1 Performance of Temporally Coded SISO Systems 
in Rayleigh Fading Channels 


Let us consider a SISO system, as shown in Figure 15.4-1, where the fading channel 
is frequency-nonselective and the fading process is Rayleigh-distributed. The encoder 
generates either an (n. k ) linear binary block code or an (n. k) binary convolutional 
code. The interleaver is assumed to be sufficiently long that the transmitted signals 
conveying the coded bits fade independently. The modulation is binary PSK, DPSK, 
or FSK. 

The error probabilities for the coded SISO channel with Rayleigh fading are given 
in Sections 14.4 and 14.7. Let us consider linear block codes first. From Section 7.2^1, 
the union bound on the codeword error probability for soft decision decoding is 

M 

Pe<J2 p 2(»4n) < (M — l)P 2 (d mia ) < 2 k P 2 {d mm ) (15.4-1) 

m= 2 


where P 2 (w m ) is the pairwise error probability given by the expression (see Sec- 
tion 14.7-1) 


P 2 (w m ) 



(15.4-2) 



FIGURE 15.4-1 

Temporally coded SISO system. 
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and 


t/f = < 


/ Yb R<: 

y 1 + 

Kfe^c/(1 + YbRc) 
. YbRc/(2 + YbRc) 


BPSK 

DPSK 

FSK (noncoherent detection) 


(15.4-3) 


For simplicity, we will use the simpler (looser) upper bound obtained by assuming 
that yb I in the expression for Pjidmm). Thus, we obtain 


where 


P e < 2 k P 2 (d mm ) 




(15.4-4) 



BPSK 

DPSK 

FSK (noncoherent detection) 


(15.4-5) 


We observe that for soft decision decoding, the error probability decays exponentially 
as 1 /YbRc, where the exponent is equal to d m - w , the minimum Hamming distance of the 
block codes. 

For hard decision decoding, we employ the Chernov bound given in Section 14.4, 
which may be expressed as 

P e < 2 k [4p(l - p)] d "" n/2 (15.4-6) 

where the error probability per coded bit is given as 

1 — t [r 

P = (15.4-7) 

and i fr is defined in Equation 15.4—3. For fb » 1, the Chernov bound simplifies to 


P e <2 k 



(15.4-8) 


where q is defined in Equation 15.4-5. As in the case of soft decision decoding, the error 
probability decays exponentially as 1 /YbRc', however, the exponent for hard decision 
decoding is d mm /2. Therefore, soft decision decoding provides twice the signal diversity 
that is obtained by hard decision decoding. 

For convolutional codes with soft decision decoding, we use the union bound 
derived in Section 14.3, namely. 


OO 

P h < Y, Pd p i(d) 

d—df Tes 


(15.4-9) 
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where P 2 (d) is given by Equation 15.4-2 and i// is defined by Equation 15.4-3. If 
Yb 1, we obtain the simpler form for the pairwise error probability, i.e., 



where q is defined by Equation 15.4-5. We observe that the leading term in Equation 
15.4-9 has an exponent of d = df ree . Hence, for soft decision decoding, the leading 
term in the error probability decays exponentially as l/ybR c , where the exponent is 
df ree , the free distance of the convolutional code. 

For hard decision decoding, we again use the Chernov bound for the pairwise error 
probability 


where p is defined by Equation 15.4-7 and \[/ is defined by Equation 15.4-3. Hence, 
with fb 3 > , P 2 (d) simplifies to 


As in the case of block codes, we observe that with hard decision decoding, the signal 
diversity achieved by the code is reduced by a factor of 2 compared with soft decision 
decoding. 

With this background on the performance of coded SISO systems, we now consider 
the performance of coded MIMO systems. 


15.4-2 Bit-Interleaved Temporal Coding for MIMO Channels 

We consider the MIMO system as shown in Figure 15.4-2, which has Nj transmit 
antennas and N R receive antennas (Nr > N r ). The encoder may generate either a 
binary block code or a convolutional code. The interleaver is selected to be suffi- 
ciently long that the coded bits in a block of the block code or in several constraint 
lengths of the convolutional code fade independently. The MIMO channel is assumed 
to be frequency-nonselective with zero-mean, complex-valued, circularly symmetric 
Gaussian distributed coefficients {h LJ }, which are identically distributed and mutually 
statistically independent. The channel metrix H is assumed to have full rank. 

The demodulator output in each signal interval is the vector y given by Equa- 
tion 15.1-10. For hard decision decoding, the vector y is fed to the detector, which 
may employ any of the three detection algorithms (MLD, MMSE, ICD) described in 
Section 15.1-2 to make the hard decisions on the transmitted bits. For soft decision 
decoding, the vector y, after deinterleaving, is fed to the decoder. Similarly, for hard 


P 2 (d) < [4p(l - p)] d/2 


(15.4-11) 



(15.4-12) 


and the bit error probability is upper-bounded as 



(15.4-13) 
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FIGURE 15.4-2 

Bit-interleaved temporally coded MIMO system. 


decision decoding, the bits from the detector output are deinterleaved and fed to the 
decoder. 

Let us consider the amount of signal diversity that is achieved in the MIMO sys- 
tem that employs spatial multiplexing of N T . Recall from Section 15.1-2 that with 
hard decision detection in an uncoded system, we achieved ( N R — Nt + l)-order 
signal diversity with linear detection and Nr -order signal diversity with the optimum 
maximum-likelihood detector (MLD). From our discussion in Section 15.4-1, we ob- 
served that the code provides a diversity of order d m - m /l or df ree /2. Therefore, in a coded 
MIMO system, the total signal diversity achieved with a linear detector and a hard de- 
cision decoder is (Nr — N t + I )d mm /2 or (Nr — Nj + l)df ree /2. On the other hand, 
if soft decision decoding is employed, the total diversity order is NRd m j n or /VV/nee- 

We demonstrate the additional diversity achieved with coding and bit-interleaving 
by computer simulation of the MIMO system shown in Figure 15.4-2 for a rate R c = 
1/2 convolutional code with d\ Ke = 5 and BPSK modulation. Figures 15.4-3 and 
15.4-4 illustrate the performance of the MIMO system for binary PSK with hard 
decision decoding and soft decision decoding, for (Nt, Nr) = (2, 2) and ( N r , Nr) = 
(2, 3). We observe that coding with interleaving improves the performance of the MIMO 
system relative to the performance of the uncoded system at the cost of a reduction in the 
data throughput rate by the reciprocal of the code rate. For (Nt, N r ) = (2, 3) and hard 
decision decoding, the MMSE detector with coding performs almost as well as the MLD 
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FIGURE 15.4-3 

Performance of coded ( R c = 1/2, df ree = 5) systems with N T = Nr = 2. 



FIGURE 15.4-4 

Performance of coded ( R c = 1/2, df me = 5) systems with N T = 2, N R = 3. 
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detector with coding. In this case, the signal diversity provided by the convolutional 
code enhances the performance of the MMSE detected data more than the performance 
of the MLD detected data. We also observe that maximum-likelihood, soft decision 
decoding is significantly better than MLD with hard decision decoding. For example, 
at 1 0 5 . the difference in performance is more than 5 dB for ( Nr . Nr) = (2, 3). This 
performance advantage is due to the factor of 2 difference in the order of diversity 
achieved by the two types of decoders. 

Also plotted in Figures 15.4—3 and 15.4-4 is the ideal performance of rate 1/2, 
r/free = 5 coded SIMO (Nr, N R ) = (1, 2) and ( N T , Nr) = (1, 3) systems. The signal 
diversity achieved by these two systems with soft decision decoding is 10 and 15, 
respectively. We observe that there is about a 2-dB degradation at //, = 1 0 5 in the 
performance of the soft decision decoded (2, 2) and (2, 3) MIMO systems compared to 
the ideal performance of the corresponding SIMO systems. This loss in performance is 
attributed to the interference resulting from the use of multiple transmitting antennas. 

The simulation results shown in Figures 15.4-3 and 15.4-4 serve to reinforce our 
analytical results on the signal diversity provided by coding with bit interleaving in 
a MIMO system. The performance superiority of maximum-likelihood soft decision 
decoding over hard decision decoding is clearly evident in these simulation results. 

In this section we employed a single encoder and a single interleaver to generate 
the coded symbols for transmission on the Nr antennas and a single deinterleaver and 
decoder at the receiver. An alternative approach that has been considered in the litera- 
ture is to employ separate but identical encoding and interleaving on the dimultiplexed 
streams fed to each of the transmit antennas. This approach requires Nt parallel en- 
coders and interleavers at the transmitter and Nr parallel decoders and deinterleavers 
at the receiver. It is especially suitable for situations where multiple data streams from 
different users are to be transmitted in parallel on multiple transmit antennas. 


15.4-3 Space-Time Block Codes for MIMO Channels 


Let us now consider the MIMO system illustrated in Figure 15.4-5. At the transmitter, 
the sequence of information bits is fed to a block encoder that maps a block of bits 
into signal points selected from a signal constellation such as PAM, PSK, or QAM, 
consisting of M = 2 b signal points. The signal points generated by the encoder as a 
block are fed to a parallel set of identical modulators which map the signal points into 
corresponding waveforms that are transmitted simultaneously on the Nr antennas. 

A space-time block code (STBC) is defined by a generator matrix G, having N 
rows and Nr columns, of the form 


gll 

g 12 

glN T 

g2\ 

g22 ‘ ' 

g2N T 

gN\ 

gN2 ' ' 

gNN T 


(15.4-14) 


in which the elements { gjj } are signal points resulting from a mapping of information 
bits to corresponding signal points from a binary or M - ary signal constellation. By 
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N t Transmit 
antennas 


N r Receive 
antennas 


FIGURE 15.4-5 

Space-time block coded MIMO 
system. 


employing Nt transmit antennas, each row of G consisting of Nj signal points (sym- 
bols) is transmitted on the Nj antennas in a time slot. Thus, the first row of Nj symbols 
is transmitted on the Nt antennas in the first time slot, the second row of N T symbols is 
transmitted on the Nt antennas in the second time slot, and the /Vth row of N r symbols 
is transmitted on the Nt antennas in the /Vth time slot. Therefore, N time slots are used 
to transmit the symbols in the N rows of the generator matrix G. 

In the design of the generator matrix of a STBC, it is desirable to focus on three 
principal objectives: (1) achieving the highest possible diversity of NtNr, (2) achieving 
the highest possible spatial rate, and (3) minimizing the complexity of the decoder. Our 
treatment considers these three objectives. 


The Alamouti STBC 

Alamouti (1998) devised a STBC for Nt = 2 transmit antennas and Nr = 1 receive 
antenna. The generator matrix for the Alamouti code is given as 


G = 



si 


(15.4-15) 


where ,V| and st are two signal points selected from an M - ary PAM, or PSK or QAM 
signal constellation with M = 2 b signal points. Thus, 2b data bits are mapped into two 
signal points (symbols) ,V| and si from the M - ary signal constellation. The symbols ,V| 
and so are transmitted on the two antennas in the first time slot, and the symbols — ,v| 
and s* are transmitted on the two antennas in the second time slot. Thus, two symbols, 
,V| and so, are transmitted in two time slots. Consequently, the spatial code rate R s = 1 
for the Alamouti code. This is the highest possible rate for a (orthogonal) STBC. 
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The MISO channel matrix for the Nt = 2, Nr = 1 channel, based on a frequency- 
nonselective model, is 


H = [h n h l2 ] 


(15.4-16) 


In the decoding of the STBC, we assume that H is constant over the two time slots. 
Consequently, the signal at the output of the matched filter demodulator of the receiver 
in the two time slots is 


yi = hnsi + /t i2*S2 + fii 
yi = —h\\s 2 + h 12*5* + r ]2 


(15.4-17) 


where i] \ and rj 2 are zero-mean, circularly symmetric complex-valued uncorrelated 
Gaussian random variables with equal variance a 2 . 

Let us consider ML decoding of the symbols in Equation 15.4-17, with the objective 
of achieving the full diversity of the STBC. Since ri\ and r ] 2 are uncorrelated zero-mean 
Gaussian random variables with equal variance, the joint conditional PDF of yi and V 2 
is 

P(y\, yi\h\u h 12 , su *52) = t ^~^2 ex P I - [l^i “ h nSi ~ h n s 2 \ 2 

nCTn + \yi + h n s* 2 - h n s\\ 2 } } /2 a 2 (15.4-18) 
Therefore, the Euclidean distance metric for ML decoding is 

p(si,s 2 ) = |yi - h n si - h l2 s 2 \ 2 + \y 2 + h n sl - hxis*^ 2 (15.4-19) 


The optimum ML decoder computes the Euclidean metrics /z(s 1 , s 2 ) for each pos- 
sible pair of symbols and selects the symbol pair that results in the smallest metric. 

The computational complexity of the ML decoding procedure is exponential in 
the number of symbol pairs; i.e., there are M 2 = 2 lb symbol pairs in the above metric 
computations. However, the computational complexity can be reduced if we expand the 
right-hand side of Equation 15.4-19 and drop the term |yi | 2 + | >'2 [ 2 , which is irrelevant 
to the decision. Thus, we obtain 

p.(si,s 2 ) = kil 2 [Ifilil 2 + \h l2 \ 2 ) - 2Re [y*huSx + y 2 h* n sx] 

+ M 2 [|*ii I 2 + \h n \ 2 ] - 2 Re [ylh n s 2 - y 2 h* n s 2 ] (15.4-20) 
= riOi) + MO2) 


Now, we observe that the metrics /rfs 1 ) and nisi) can be computed separately; i.e., 
we determine the symbol ,V| that minimizes ii(s\ ) and the symbol s 2 that minimizes 
li(s 2 ). Thus, the computational complexity is significantly reduced from computing M 2 
metrics to 2 M metrics. 

A further simplification in decoding results when the signal points in the con- 
stellation have equal energy, as in PSK constellations. In such a case, the bias energy 
terms |ii| 2 [|/?n| 2 + | /? 12 1 2 ] an d |so| 2 [ I 1 1 1 2 + l^nl 2 ] can be ignored. Furthermore, the 
metrics /t(si) and ii(s 2 ) can be rearranged as correlation metrics, defined as 

McG 1 ) = Re [y*/ 2 n.si + yih* 2 s\] 

Pc{s 2 ) = Re [y*hns 2 - y 2 h* n s 2 ] 


(15.4-21) 
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That is, we correlate y* with all possible values of si, scaled by h\ \, and y 2 with all 
possible values of s i , scaled by h* 2 , and select the ,V| that results in the largest correlation 
metric /i c (s i). A similar computation is performed to find the value of S 2 that yields the 
largest fj, c (s 2 ). 

For PAM and QAM signal constellations, the correlation metrics include the bias 
terms in Equation 15.4-20. Hence, the correlation metrics may be expressed as 

McOi) = 2Re [yf/tnsi + y 2 h* n si\ - |Ji| 2 [|/in| 2 + \h l2 \ 2 ] 4 _ 22 

Mcfe) = 2 Re [yfh l2 s 2 ~ y 2 h* n s 2 ] - \s 2 \ 2 [\h n \ 2 + I*i 2 l 2 ] 

It is interesting to note that for the particular symbol jj that is contained in y\ and 
y 2 , the signal component in the metric ii c (s\) is the largest possible and has the value 

E[fi c (s i)] = |ji| 2 [|/m| 2 + \h l2 \ 2 ] (15.4-23) 


where the expectation is taken over the additive Gaussian noise. Similarly, we have 

E[Hc(s 2 )] = \s 2 \ 2 [\h n \ 2 + \hn\ 2 ] (15.4-24) 


Since each signal term contains the term [|/zn | 2 + \h\ 2 \ 2 ] , the ML decoder achieves a 
diversity of order 2, which is the maximum possible diversity with Nj = 2 and Nr = 1 
antennas. 

Instead of computing the correlation metrics as defined in Equation 15.4-22, an 
equivalent detector (see Problem 15.15) computes the estimates of the symbols si and 
s 2 as follows: 


(15.4-25) 


•?i = yih* n +y 2 h 12 
= yih* 2 — y 2 hn 

and it selects the symbols Si and Si that are closest to Si and s 2 in Euclidean distance. 

We make the following observation on the Alamouti STBC. First, we observe that 
the code achieves the largest possible diversity. Second, through the separation of the 
detector metrics given in Equation 1 5.4-22 or, equivalently, the estimates Si and s 2 given 
in Equation 15.4-25, the maximum-likelihood detector has low complexity. These two 
desirable properties were achieved as a result of the orthogonality characteristic of the 
generator matrix G for the Alamouti code, which we may express as 


G = 



g2 

8 * 


(15.4-26) 


We observe that the column vectors v \ = (g 1 . — g 2 )‘ and v 2 = (g 2 , g*\ )' are orthogonal; 
i.e., i>\ ■ i >2 = 0 and, furthermore, 

G H G=[\ gl \ 2 + \g 2 \ 2 ]l 2 (15.4-27) 


where / 2 is a 2 x 2 identity matrix. As a consequence of this property, when we express 
the received signal given in Equation 15.4-17 as 


yi 

_y 2 


hn 

h* 

|_"12 



Si 

+ 

m 


S 2 


L72J 


y = H 2 \s + r] 


(15.4-28) 
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and form the estimates ,V| and St as prescribed in Equation 15.4—25 from y in Equa- 
tion 15.4-28, we obtain 


St 


r h * 

n 11 

h \2 


">'1 ’ 

h_ 


h * 

L /2 12 

—hn 


.>2. 


= H»H 2l s + H»y 


= [\h u \ 2 + \h n \ 2 ] s + H^rj 

Therefore, 

H^H n = [\h n \ 2 + \h u \ 2 ] h 


(15.4-29) 


(15.4-30) 


Thus, full diversity and low decoding complexity are achieved as a consequence of the 
orthogonality property of G given in Equation 15.4-27. 


Alamouti Code with Multiple Receive Antennas 

We shall now demonstrate that the Alamouti code achieves the maximum possible 
diversity of Nr Nr = 2 Nr when the number of receive antennas is increased to Nr. In 
this case, the Nr x 2 channel matrix is 


h it h 12 
hn ti22 

H = [h l h 2 ] = 

_h-N R 1 Ii Nr 2_ 
In the first transmission, the received signal is 


yi = H 


+ 


and in the second transmission, the received signal is 


yi = H 


- S 4 


+ r) 2 


(15.4-31) 


(15.4-32) 


(15.4-33) 


As in the case of the MISO Nr = 2, Nr = 1 system, we may combine Equa- 
tions 15.4—32 and 15.4-33 into the equation 


y i 

[y*i 


= H 2 n r 


^t 

+ 

’j?l" 

. 5 2. 


[t 2 \ 


where Hon r is defined as follows: 


H2N r = 


h i 

h\ 


h 2 

~h\ 


(15.4-34) 


(15.4-35) 


Here h \ and h 2 are the column vectors of the channel matrix given in Equation 1 5 .4-3 1 . 
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Suppose we form the estimates Si and S 2 as 


= H 2N R 


y 1 
y\ 


— HinrH^Nr 


+ H h 


2 Nr 


V 1 

t 2 


It is easily verified that 


H?Nr H 2Nr = 


^2 \hii\ 2 + \h i2 \ 2 ) 


L/=l 

II Hill / 2 


Consequently, Equation 15.4-36 simplifies to 


= 11 H 


+ H 


H 

2 Nr 


ni 

r 2 


(15.4-36) 


(15.4-37) 


(15.4-38) 


We conclude that the Alamouti code achieves the full diversity of 2 N R available 
in the MIMO system with Nj = 2 transmit and Nr receive antennas. Furthermore, the 
maximum-likelihood decoder bases its decisions on the decoupled estimates Si and S 2 
obtained from Equation 15.4-36 as 

Si = h¥ y 1 + y 2 h 2 

2 ' (15.4-39) 

si = h 2 y 1 - y 2 h 1 

Hence, implementation complexity of the detector is minimized. 


Orthogonal Code Design for N T > 2 Transmit Antennas 

The design of orthogonal generator matrices for more than Nt = 2 transmit antennas 
has been extensively studied. Jafarkhani (2005) gives a comprehensive treatment on 
their construction based on early work by Hurwitz and Radon (1922) on the design of 
real orthogonal matrices. A real N x N matrix G with entries gi, —gi, g 2 , —g 2 , . . . , 
gN, —gN, is said to be orthogonal if 


G'G = 



(15.4-40) 


where In is the N x N identity matrix. It can be shown (Jafarkhani (2005)) that rate 
R s = 1 real orthogonal matrix designs exist only for N = 2,4, 8. For example, a real 
orthogonal matrix for Nt = 4 transmit antennas is the following: 


G = 


8 1 

8 2 

83 

84 

-82 

81 

-84 

83 

~83 

84 

8 1 

-82 

-84 

-83 

#2 

8 1 


(15.4-41) 


With {gi } equal to j ,v, } in the generator matrix in Equation 1 5 .4 — 4 1 , this code transmits 
four symbols in four consecutive time slots. Hence, R s = 1 for this code. 
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Real orthogonal generator matrices are suitable for transmitting PAM signal con- 
stellations and square QAM signal constellations that can be decoupled into two separate 
PAM signal constellations. Real orthogonal generator matrix designs provide a diver- 
sity of order NjNr and result in simple maximum-likelihood decoding by decoupling 
the decision for each transmitted symbol. 

The orthogonality property which results in a low-complexity maximum-likelihood 
detector can be achieved for N > 8 at the cost of a lower spatial rate. Such space-time 
block codes are called generalized orthogonal codes and are defined by a K x N 
generator matrix G with real entries g l5 — gi, g 2 , —gi, ■ • • , gn , —gK, that satisfies the 
property 


GG = b 



In 


where b is a constant. The spatial rate is R s = K/N. 

The Alamouti code is an example of an orthogonal complex matrix design for 
N t = 2. It has been shown in the literature (see Jafarkhani (2005) and Tarokh et al. 
(1999a)) that orthogonal complex matrix designs with R s = 1 do not exist for Nt > 2 
transmit antennas. However, by reducing the code rate, it is possible to devise complex 
orthogonal designs for two-dimensional signal constellations. For example, an orthog- 
onal generator matrix for a STBC that transmits four complex-valued (PSK or QAM) 
symbols on N T = 4 transmit antennas is 


S\ 

Si 

S 3 

5 4 

-S2 

St 

-54 

S3 

S3 


Si 

-S2 

-s 4 


Si 

S 1 


S 2 

s 3 

5 4 

~ S 2 


— f* 

s 4 

?* 

s 3 

— r* 
s 3 

t* 

■M 


-*2* 

s 4 

s 3 

r.* 

s 2 

sf 


(15.4-42) 


For this code generator, the four complex- valued symbols are transmitted in eight 
consecutive time slots. Hence the spatial rate for this code is R s = 1/2. We also 
observe that 


4 

G H G = J2[M 2 ]l4 (15.4-43) 

i=i 

so that this code provides fourth-order diversity in the case of one receive antenna and 
4 Nr diversity with Nr receive antennas. 

Complex orthogonal matrices with rate R s < 1/2 exist for any number of transmit 
antennas. However, Wang and Xia (2003) have shown that complex orthogonal matrices 
for rates R s > 3/4 do not exist. Rate R s = 3/4 complex orthogonal matrices do exist. 
The following R s =3/4 complex orthogonal generator matrices are given in the paper 
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by Tarokh et al. (1999a) for Nj = 3 and Nt = 4 transmit antennas: 

*i s 2 

G = ~ S * 2 S * 

* 3 */7 2 jf/72 

_s|/7 2 —s^/y /2 (s 2 + s 2 + si — s*)/2 

*1 *2 53/ V2 S3/V2 

_ — *2 7 *3/72 -S 3 /V 2 

sl/V 2 s 3 */72 (-si - s? + s 2 - s 2 *)/2 (-*2 - *2 + Si - 7)/2 

*3 / \/2 — 53 / -v/2 (s 2 "F *2 T Si — **)/2 — (*1 + S* + S 2 — S*)/2 

(15.4-45) 

Finally, we should indicate that orthogonal generator matrix designs are not unique. 
To demonstrate this point, let V denote a unitary matrix, i.e., U H U = /, and let G be 
a complex orthogonal matrix. Define G„ = UG. Then 

G^G U = (UG) H UG 

= G h U h UG (15.4-46) 

= G H G 

Hence, a system employing the generator matrix G„ has the same properties as a system 
employing G. 


* 3/72 

*3/72 

(-*1 - S* + *2 - Sp/2 


(15.4-44) 


Quasi-orthogonal Space-Time Block Codes As we have observed, orthogonal 
STBCs have the desirable property that the maximum-likelihood (ML) detector reduces 
to one that detects each symbol separately. Furthermore, for N =2,4, and 8, a real 
orthogonal STBC yields full diversity. Similarly, for N = 2, the Alamouti code with 
complex elements yields full diversity. We also observed that by reducing the code rate, 
it is possible to design (generalized) orthogonal codes having either real or complex 
elements. Thus, the low complexity of separate symbol detection can be maintained at 
the expense of a reduced rate and diversity. 

On the other hand, we may relax the orthogonality condition which results in 
separate ML detection and attempt to design STBC with spatial rate R s = 1 and full 
diversity. The simplest detector of such a design is one that allows for pairwise ML 
symbol detection. Such a code is called quasi-orthogonal. For example, a complex 
quasi-orthogonal STBC with rate R s = 1 is specified by the generator matrix 


*1 

*2 

*3 

*4 

— V* 
S 2 

7 

-7 

?* 

s 3 

— V* 

* "sf 
1 

7 

c* 

*2 

. *4 

*3 

*2 

*1 


The transmitted symbols for this code can be optimally detected by a pairwise ML 
detector, and the code yields full diversity (see Problem 15.23). 
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Differential Space-Time Block Codes 

In the application of the Alamouti code, as we have observed, it is assumed that the 
channel path coefficients {h t j\ are constant over two successive time intervals. For 
Nt > 2 transmit antennas, the time interval over which the channel path coefficients 
are assumed to be constant is even larger. For example, the STBCs given in Equa- 
tions 15.4^11, 15.4-44, and 15.4-45 are constructed based on the assumption that the 
channel path coefficients are constant over four time intervals. In a fading channel, 
this assumption is usually not satisfied precisely. That is, in practice, the channel path 
coefficients vary to some extent from one time interval to another. Consequently, the 
performance of the coherent detector may be degraded by the channel variation from 
one signal interval to the next. Further deterioration in the performance of the detector is 
caused by noisy estimates of the channel path coefficients {h LJ \. Typically, in practical 
systems, the transmitter sends pilot signals that the receiver uses to obtain estimates 
of the channel path coefficients. Then the estimates are used in the demodulation and 
detection of the STBC. In general, these estimates are noisy and cause some deteri- 
oration in the performance of the system. The effects of channel time variations and 
noisy channel estimates on the performance of the STBC have received considerable 
attention in the technical literature, e.g., Tarokh et al. (1999b), Buehrer and Kumar 
(2002), Gu and Leung (2003), and Jootar et al. (2005). 

In rapidly fading channels, where the channel time variations preclude the use of 
coherent STBC, one may employ differential space-time modulation, which is akin to 
differential PSK (DPSK). Differential STBCs do not require knowledge of the channel 
path coefficients at the receiver. Consequently, the detector performs differentially 
coherent detection. As a result, the performance achieved by a differential STBC on 
a Rayleigh fading channel is approximately 3 dB worse than the performance of a 
coherently detected STBC. Differential STBCs are described in the papers by Tarokh 
and Jafarkhani (2000), Hughes (2000), Hochwald and Sweldens (2000), Tao and Cheng 
(2001), Jafarkhani and Tarokh (2001), Jafarkhani (2003), and Chen et al. (2003). 

15.4-4 Pairwise Error Probability for a Space-Time Code 

In this section we derive an expression for the pairwise error probability for a space-time 
coded MIMO system that is communicating over a frequency-nonselective Rayleigh 
fading channel. The MIMO system is assumed to employ a STBC for AY transmit 
antennas and have spatial rate R s = Nt/N, where N is the block length (number of 
time slots used to transmit the block code). 

Let us denote the signal elements transmitted in each time slot by the vector s(l) = 
[sM s 2 (l) ■■■ s Nt (1)V for \ < l < N and let the space-time codeword be denoted by the 
Nt x N matrix S = [s(l) s(2) • ■ ■ s(N)]. Then the transmitted signal may be expressed 
in matrix form as 



(15.4-47) 


and the received signal may be expressed as 



(15.4-48) 
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where H is the Nr x Nt channel matrix with path coefficients {/?,;}, which are constant 
over the entire codeword, Y = [y( I ) y( 2) • • • y(fV)] with 


y(D = 



Hs(l ) + 71(1), 


1 < / < N 


(15.4-49) 


and N = [j/(1) tj(2) . . . r](N)\ represents the additive noise. The noise components are 
assumed to be statistically independent and identically distributed, zero-mean, complex- 
valued Gaussian with variance No- 

The receiver employs a maximum-likelihood (ML) decoder that is assumed to 
know the channel matrix H. Since the additive noise components are iid, the decoder 
searches for the valid codeword that is closest in Euclidean distance to the received 
codeword. Thus, the decoder output is 

~S = arg min \\Y - HS f F (15.4-50) 

S 


Let us assume that the codeword S lk> was transmitted. Then the pairwise error 
probability (PEP) that S (;) is selected when S ,k> is transmitted, for any given channel 
matrix realization, is 


P(S (k) -* S (j) \H) = Q 


£ x 


2N 0 N t 


||i/(S® _SU))||2 


(15.4-51) 


It is convenient to define an N T x N error matrix as Ey = S (k> — S (j> and to 
approximate the PEP by the Chernov bound 

E(S® -> S&\H) < ex P II HEk J Ilf} (15.4-52) 

We can now average this conditional PEP over the statistics of the channel matrix H . 
Assuming that the channel path coefficients {h- L j} are iid, complex-valued zero-mean 
Gaussian (spatially white channel), the average of the PEP in Equation 15.4-52 over 
the statistics of the channel path coefficients yields the upper bound on the average 
PEP as 


P(S ( *> -* S <j> ) < 


1 


C \ 1 n r 

det ( i Nt + ' /-;< , /•;/' 


4NqNt 


''kjv kj 


\ Nr 


< 


n 


n= 1 1 + 


£ s X n 

4NqNt J 


(15.4-53) 


where r is the rank of the N r x N r matrix A kj = EkjE^ and {A.,,} are the nonzero 
eigenvalues of Ay. 
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At high SNR, where £ s /4NqN t 1, the bound on the PEP may be expressed as 


This expression for the PEP suggests the following two criteria for designing ST 
codes, namely, the rank criterion and the determinant criterion, as described in the 
paper by Tarokh et al. (1998). In applying the rank criterion, the objective is to achieve 
the maximum possible diversity of NjNr, which is obtained when the matrix Ay is 
full rank (r = N- r ) for any pair of valid codewords. If A y has minimum rank r for a 
pair of codewords, the order of diversity is r Nr. In applying the determinant criterion, 
the objective is to maximize the minimum of the determinant of matrix Ay taken over 
all pairs (A:, j) of valid codewords. The term in the PEP involving the product of the 
nonzero eigenvalues of Ay has been coined as the coding gain of the space-time code. 
Hence, the determinant criterion has the objective of maximizing the coding gain of 
the space-time code. 


15.4-5 Space-Time Trellis Codes for MIMO Channels 

We observed in Section 8.12 that trellis-coded modulation (TCM) is a combination of 
a trellis code and an appropriately selected signal constellation designed with the aim 
of achieving a coding gain. Space-time trellis coding also combines trellis coding and 
a selected signal constellation with the primary objective of achieving the maximum 
possible spatial diversity at the highest code rate. To achieve this objective, code con- 
struction may be based on applying the rank criterion and the determinant criterion 
described in Section 15.4-4. 

In applying the rank criterion, we optimize the spatial diversity obtained from 
the space-time code, or equivalently we maximize the rank of the matrices Ay = 
( 5 ( 0 _ s(ri)(S (0 _ s U) ) H over all pairs (i, j) of codewords. The goal is to achieve the 
full rank of Nr - It has been shown (see Jafarkhani (2005)) that for a bit rate of b bps/Hz 
and a diversity r, a space-time trellis code (STTC) must have at least 2 h<r ~ 11 states. 
Thus, to achieve full diversity, a STTC must have at least 2 Ii<Nt ~ 1 * states. 

Space-time trellis codes may be designed either manually or with the aid of a 
computer by following some simple rules, similar in nature to the rules formulated by 
Ungerboeck (1982) for designing trellis codes for TCM. Tarokh et al. (1998) specify 
two design rules that guarantee full diversity for MIMO systems with two transmit 
antennas. 

Design Rule 1: Transitions departing from the same state should differ in the 
second symbol (symbol transmitted on the second antenna). 

Design Rule 2: Transitions arriving at the same state should differ in the first 
symbol (symbol transmitted on the first antenna). 

As an example of a STTC, we consider the 4-state trellis code shown in Fig- 
ure 15.4-6, which is designed for two transmit antennas and QPSK modulation. 



(15.4-54) 
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FIGURE 15.4-6 

4-PSK, 4-state, space-time trellis code. 


The states are denoted as S t = 0,1, 2, 3. The input to the encoder is a pair of bits 
(00, 01, 10, 1 1) which are mapped into the corresponding phases that are numbered 
(0, 1, 2, 3), respectively. The indices 0, 1, 2, 3 correspond to the four phases, which are 
called symbols. Initially, the encoder is in state S t = 0. Then for each pair of input 
bits, which are mapped into a corresponding symbol, the encoder generates a pair of 
symbols, the first of which is transmitted on the first antenna, and the second symbol is 
transmitted simultaneously on the second antenna. For example, when the encoder is 
in state S r = 0 and the input bits are 1 1, the symbol is a 3. The STTC outputs the pair 
of symbols (0, 3), corresponding to the phases 0 and 3jt/2. The zero phase signal is 
transmitted in the first antenna, and the 3n/2 phase signal is transmitted on the second 
antenna. At this point the encoder goes to state S , = 3. If the next two input bits are 
01, the encoder outputs the symbols (3,1) which are transmitted on the two antennas. 
Then, the encoder goes to state S, = 1, and this procedure continues. At the end of a 
block of input bits, say a frame of data, zeros are inserted in the data stream to return 
the encoder to the state S t = 0. Thus the STTC transmits at a bit rate of 2 bps/Hz. We 
note that it satisfies the two design rules given above and achieves full rank of Nj = 2. 

Increasing the number of states in the trellis beyond 2 b states allows the designer to 
increase the coding gain by increasing the product of the eigenvalues (determinant) in 
the expression for the pairwise error probability. For example, the 8-state STTC, given 
in the paper by Tarokh et al. (1998), that transmits at a bit rate of 2 bps/Hz with QPSK 
modulation is shown in Figure 15.4-7. This code provides the same diversity order 
( 2 Nr) as the 4-state STTC illustrated in Figure 1 5.4-6, but achieves a larger coding gain. 
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FIGURE 15.4-7 

4-PSK, 8-state, space-time trellis code. 


The paper by Tarokh et al. (1998) also describes higher rate codes for two transmit 
antennas. For example, Figure 15.4-8 illustrates an 8-state STTC for use with 8-PSK 
modulation to achieve a bit rate of 3 bps/Hz and full diversity of Nt = 2. STTC for 
large constellations employing QAM are given in the paper by Tarokh et al. (1998) and 
other publications in the literature. 

In decoding a STTC, the maximum-likelihood sequence detection (MLSD) crite- 
rion provides the optimum performance. MLSD is efficiently implemented by use of the 
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Input: 0 1 7 5 4 6 • • • 

Antenna 1 : 0 0 5 3 1 4 

Antenna 2: 0 1 7 5 4 6 


00 01 02 03 04 05 06 07 


50 51 52 53 54 55 56 57 


20 21 22 23 24 25 26 27 


70 71 72 73 74 75 76 77 


40 41 42 43 44 45 46 47 


10 11 12 13 14 15 16 17 


60 61 62 63 64 65 66 67 



FIGURE 15.4-8 

8-PSK, 8-state, space-time trellis code. 

Viterbi algorithm. For two transmit antennas., the branch metrics may be expressed as 


Nr 

Vb(si,s 2 ) = ^2 I yj - h i j S 1 - h 2j Si\ 2 (15.4-55) 

;= i 

where {yj, I < / < Nr] are the outputs of the matched filters at the Nr receive 
antennas, { h | ; . 1 < j < N r } and {h 2J , 1 < j < Nr} are the channel coefficients in 
a frequency-nonselective channel, and (.v i , 52 ) denote the symbols transmitted on the 
two antennas. By using these branch metrics in the Viterbi algorithm, to form the path 
metrics of valid paths through the trellis, we can find the path that minimizes the overall 
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metric and thus determine the sequence of transmitted symbols corresponding to the 
path having the smallest path metric. 


15.4-6 Concatenated Space-Time Codes and Turbo Codes 

In Section 1 5.4-2, we observed that temporal coding with interleaving provides a means 
to achieve diversity in a MIMO system. It is also possible to construct concatenated 
codes using temporal coding with interleaving in combination with space-time codes. 
Figure 15.4-9 illustrates a system in which the input data stream is temporally coded 
by either a block code or a convolutional code. Following the temporal encoding, the 
data are bit-interleaved and passed to the space-time encoder, which may be either a 
STBC or a STTC. 

At the receiver, the space-time code is decoded first, and its output is deinter- 
leaved and passed to the outer decoder. The output of the outer decoder constitutes the 



N t Transmit 
antennas 


(a) Transmitter 



N r Receive 
antennas 


(b) Receiver 


FIGURE 15.4-9 

A MIMO system with concatenated coding consisting of a temporal outer code and a 
space-time inner code (dotted lines in the receiver indicate iterative decoding). 
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reconstructed data stream. If desired, iterative decoding can be performed between the 
inner and outer decoders by making multiple passes on the received data signal. Such 
iterative decoding leads to an improvement in system performance but at a significant 
cost in implementation (computational) complexity. 

A turbo code (parallel concatenated convolutional encoders separated by an inter- 
leaver) can also be used as the outer code in a concatenated coding scheme, as shown 
in Figure 15.4-9. In such a case, the outer decoder at the receiver is a turbo (iterative) 
decoder. Iterative decoding can also be implemented between the turbo decoder and 
the space-time decoder. However, iterative decoding between the inner space-time de- 
coder and the turbo decoder significantly increases the computational complexity of 
the receiver. 


■ 15.5 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

The use of multiple antennas at the receiver of the communication system has been a 
well-known method for achieving spatial diversity to combat fading without expand- 
ing the bandwidth of the transmitted signal. Much less common has been the use of 
multiple antennas at the transmitter to achieve spatial diversity. The publications of 
Wittneben (1993) and Seshadri and Winters (1994) are two of the early publications on 
this topic. 

A major breakthrough occurred with the publications of Foschini (1996) and 
Foschini and Gans (1998), which demonstrated that multiple antennas at the trans- 
mitter and the receiver of a wireless communication system can be used to establish 
multiple parallel channels for simultaneous transmission of multiple data streams in 
the same frequency band (spatial multiplexing) and, thus, result in extremely high 
bandwidth efficiency. Since then, there have been numerous publications on the analy- 
sis of the performance characteristics of MIMO wireless communication systems and 
their implementation in practical systems. Basic treatments of MIMO systems may be 
found in the textbooks by Goldsmith (2005), Haykin and Moher (2005), and Tse and 
Viswanath (2005). 

Pioneering work on space-time coding for MIMO channels was performed by 
Tarokh et al. (1998, 1999a). The book by Jafarkhani (2005) provides a comprehensive 
treatment of both space-time block codes and trellis codes. 


PROBLEMS 

15.1 Consider an (Nj, N R ) = (2, 1) MIMO system that employs the Alamouti code to trans- 
mit a binary sequence using binary PSK modulation. The channel is Rayleigh fading 
characterized by the channel vector 


li = [An hnf 
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with ii | /? 1 1 1 2 = E\hu\ 2 = 1. The additive noise is zero-mean Gaussian. Determine the 
average probability of error for the system. 

15.2 Consider a SIMO AWGN channel with N R receive antennas. Instead of maximal ratio 
combining, the receiver selects the signal from the antenna having the strongest signal; 
i.e., if h = [hi, h 2 , . . . , hw R \ is the channel vector, the receiver selects the antenna with 
channel coefficient 


l*maxl = max|/z,|, i = 1,2,... , Nr 


This method is called selection diversity. Determine the capacity of a MIMO system that 
employs selection diversity. 

15.3 Prove the relationship between the eigenvalues of HH h and the singular values of the 
channel matrix H, as given by Equation 15.2 — 4. 


15.4 Consider a MIMO system with N R = N T = N antennas and AWGN. The ergodic 
capacity for the MIMO system is 


C = E 


Y log-, [1-1 —Xi 

\ NNo 

1 = 1 


Show that for N large, the capacity can be approximated as 

£ s 

c « 

N o m 2 

where X m is the average of the eigenvalues of HH h . 

15.5 Consider a deterministic SIMO channel with AWGN in which the elements of the channel 
vector h satisfy the conditions |/ 2 , | 2 =1, i = 1,2,..., N R . 

a. Determine the capacity of this SIMO channel when h is known at the receiver only. 

b. Suppose that h is also known at the transmitter. Does this additional knowledge 
increase the channel capacity? Explain. 

15.6 Consider a deterministic MISO channel with AWGN in which the elements of the channel 
vector h satisfy the conditions \h,\ 2 = 1, i = 1,2, ... , N T . 

a. Determine the capacity of this MISO channel when h is known at the receiver only. 

b. How does this capacity compare with that of a SIMO and a SISO channel? 

15.7 Consider a MIMO system with Nr = N t = N antennas and AWGN. The rank of the 
channel matrix H is IV. 

a. Show that the capacity 
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subject to the constraint that 


N 

Xi = 1 6 = constant 


is maximized when Xj = f}/N for i = 1,2 , . . . , N, and hence 


C = N log 2 



b. If Xj = p/N for ( = 1,2,..., N, show that H must be an orthogonal matrix that 
satisfies the condition 

HH h = H h H = —In 
N 

c. Show that if all the elements of H are unit magnitude, i.e., | //,-,■ | = 1 , then || H\\ 2 F = N 2 
and 


Hence, under these conditions, the capacity of the orthogonal MIMO channel is N times 
the capacity of a SISO channel. 

15.8 The received signal vector in a frequency-nonselective AWGN MIMO channel with N T 
transmit antennas and Nr receive antennas is given by Equation 15.2-7 as 


where X is a diagonal matrix of rank r with the nonzero diagonal elements equal to 
the singular values of the channel matrix H . 

b. Show that if the elements of rj are statistically iid, zero-mean, complex- valued Gaussian 
random variables, then the elements of rj' are also iid zero-mean complex-valued 
Gaussian random variables. 

c. Show that the capacity of the AWGN MIMO channel may be expressed as 


where Pi, Pi, ■ ■ ■ , P, are the allocated powers based on the water- filling criterion with 
the total power constraint 



y = Hs + rj 

a. Use the SVD to transform the received signal vector to the form 

y' = Es + r\' 



bps/Hz 


r 




k=\ 


15.9 The capacity of MISO channel with AWGN, when the channel is known at the receiver 
only, may be expressed as 
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where y is the SNR and h = \h\ hi - ■ ■ h^ T |' is the channel coefficient vector. Suppose 
the channel coefficients are iid zero-mean, complex Gaussian distributed with E\hj\ 2 = 
1, i = 1,2, N t . 

a. Determine the PDF of the random variable 

N t 

x=Y J \h i \ 2 

; = 1 

b. Note that C is a monotonic function of X. Show that the outage probability for the 
MISO system may be expressed as 


P out — P 


2 C - 1 1 

X < N t 

Y 


c. Evaluate and plot Pout versus y for C = 2 bps/Hz and Nt = 1, 2, 4, 8. 

d. For y = 10 dB, evaluate and plot the complementary cumulative distribution function 
(CCDF) 


1 - Pout = P 


2 C - 1 1 

X > N t 

Y 


versus C for N = 1, 2, 4, 8. This is the CCDF for the outage capacity. Repeat the 
computation for y = 20 dB. 

e. Let P out = 0.1 (corresponding to 10% outage capacity) and plot C versus y for 
N t = 1,2,4, 8. 


15.10 Consider a deterministic MISO (Nt, 1) channel with AWGN and channel vector h. The 
received signal in any signal interval may be expressed as 

y = hs + rj 


where y and ;; are scalars. 

a. If the channel vector h is known at the transmitter, demonstrate that the received SNR 
is maximized when the information is sent in the direction of the channel vector h, 
i.e., s is selected as 



(The alignment of the transmit signal in the direction of the channel vector h is called 
transmit beamforming.) 

b. What is the capacity of the MISO channel when h is known at the transmitter? 

c. Compare the capacity obtained in ( b ) with that of a SIMO channel, when the channel 
matrix h is identical for the two systems. 

15.11 Determine the outage probability of an (Nt, Nr) = (4, 1) MIMO system for an SNR 
y = 20 dB and outage capacity C out = 2 bps/Hz. 

15.12 The capacity of a SIMO channel with AWGN may be expressed as 

(■ + r|>f) 


C = log 2 
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where y is the SNR and h = [h\ hi • • • h NR Y is the channel coefficient vector. The channel 
coefficients are complex-valued, iid zero-mean Gaussian distributed with E\hj\ 2 =l, 
i = 1,2,..., N r . 

a. Determine the PDF of the random variable 

Nr 

X = Y J \h i \ 2 

1 = 1 

b. Note that C is a monotonic function of X. Show that the outage probability for the 
SIMO system may be expressed as 


P out P 


X < 


2 C - f 
Y 


c. Evaluate and plot p out versus y for C = 2 bps/FIz and Nr = 1, 2, 4, 8. 

d. For y = 10 dB, evaluate and plot the complementary cumulative distribution function 
(CCDF) 


1 - Pom = P 


X > 


2 C — 1 
Y 


versus C for N R = 1, 2, 4, 8. This is the CCDF for the outage capacity. Repeat for 
y = 20 dB. 

e. Let P out = 0.1 (corresponding to 10% outage capacity) and plot C versus y for 
N r = 1,2, 4, 8. 

15.13 Consider an (Nt, Nr) = (2, Nr) MIMO system that employs the Alamouti code with 
QPSK modulation. If the input bit stream is 01 101001 1 10010, determine the transmitted 
symbols from each antenna for each signaling interval. 

15.14 Show that the detector that computes the estimates Si and S 2 given by Equation 15.4-25 
is equivalent to the detector that computes the correlation metrics in Equation 15.4-22. 

15.15 Determine the decision variables for the separate ML decoding of the symbols in the 
following rate 3/4 block code. 


C = 


si s 2 

_ C* P* 

s* 0 


S3 

0 



15.16 Determine the decision variables for the separate ML decoding of the symbols in the rate 
1/2 orthogonal STBC given by Equation 15.4^12. 


15.17 Determine the probability of error for the detector with input metrics given by Equa- 
tion 15.3-5 for BPSK modulation and a Rayleigh fading channel. Assume that the com- 
ponents of h j are iid, zero-mean, complex-valued Gaussian random variables. 
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15.18 For a Rayleigh fading channel and BPSK modulation, determine the performance of a 
MISO (2, 1) system employing the Alamouti code with that of a SIMO (1,2) system. 
Assume that the transmitter power is the same for the two systems. 

15.19 Consider a MISO (2, 1) system in which the Alamouti code is used in conjunction with 
multicode spread spectrum. To be specific, suppose that the symbol si is spread by code 
C\ and — sf is spread by code C2 . These two spread spectrum signals are added and 
transmitted on antenna 1 . Similarly, the symbol St is spread by Ci and the symbol s* is 
spread by the code C2 . Then two spread spectrum signals are added and transmitted on 
antenna 2. The channel coefficients h\ and h 2 are known at the receiver. 

a. Sketch the block diagram configuration of the transmitter and the receiver, illustrating 
the modulation and demodulation operations. 

b. Assuming that the spreading codes C\ and C 2 are orthogonal, determine the expressions 
for the decision variables Si and S2. 

c. What, if any, are the advantages and disadvantages of this multicode MISO (2, 1) 
system over the conventional MISO (2, 1) system that employs the Alamouti STBC 
without the multicode spreading? 

15.20 Consider an uncoded MIMO system with Nt = Nr antennas that transmits over a 
frequency-nonselective channel in which the channel matrix H has iid complex-valued, 
zero-mean Gaussian elements. The received signal vector is 

y = Hs + ri 

where the elements of ti are iid complex-valued, zero-mean Gaussian. The detector used 
at the receiver is the inverse channel detector (ICD), described in Section 15.1-2. 

a. Determine the covariance matrix of the noise at the output of the detector. 

b. If the detector makes independent decisions on each of the Nt transmitted symbols, 
is this detector optimum (in the sense of minimizing the error probability)? 

c. If BPSK modulation is employed, determine the error probability of the detector 
described in (b). 

d. Now, suppose that N R > N T and the decisions made by the detector are based on the 
signal estimate s = W H y, where W H = {H H H)~ l H h . Repeat parts (a) and (b). 

15.21 The channel matrix in an N T = N R =2 MIMO system with AWGN is 


H = 


0.4 

0.7 


0.5 

0.3 


a. Determine the SVD of H. 

b. Based on the SVD of H, determine an equivalent MIMO system having two indepen- 
dent channels, and find the optimal power allocation and channel capacity when H is 
known at the transmitter and the receiver. 

c. Determine the channel capacity when H is known only at the receiver. 

15.22 Consider the following two MISO (2, 1) systems with AWGN. The first employs the 
Alamouti code to achieve transmit diversity when the channel is known only at the 
receiver. The second MISO (2, 1) also achieves transmit diversity, but the channel is 
known at the transmitter. Determine and compare the outage probabilities for the two 
systems. Which MISO system has a lower outage probability for the same SNR? 
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15.23 The generator matrix for a rate R s = 1 STBC is given as 


5l S 2 5 3 $4 



. 54 —53 —52 5i _ 


a. Determine the matrix G H G, and thus show that the code is not orthogonal. 

b. Show that the ML detector can perform pairwise ML detection. 

c. What is the order of diversity achieved by this code? 



Multiuser Communications 


In the MIMO communication systems that were treated in Chapter 15, we observed 
that multiple data streams can be sent simultaneously from a transmitter employing 
multiple antennas to a receiver that employs multiple receive antennas. This type of 
a MIMO system is generally viewed as a single-user point-to-point communication 
system, having the primary objectives of increasing the data rate through spatial mul- 
tiplexing and improving the error rate performance by increasing signal diversity to 
combat fading. In this chapter, the focus shifts to multiple users and multiple commu- 
nication links. We explore the various ways in which multiple users access a common 
channel to transmit information. The multiple access methods that are described in 
this chapter form the basis for current and future wireline and wireless communication 
networks, such as satellite networks, cellular and mobile communication networks, and 
underwater acoustic networks. 


■ 16.1 

INTRODUCTION TO MULTIPLE ACCESS TECHNIQUES 

It is instructive to distinguish among several types of multiuser communication systems. 
One type is a multiple access system in which a large number of users share a common 
communication channel to transmit information to a receiver. A model of such a system 
is depicted in Figure 16.1-1. The common channel may represent the uplink in either 
a cellular or a satellite communication system, or a cable to which are connected a 
number of terminals that access a central computer. For example, in a mobile cellular 
communication system, the users are the mobile terminals in any particular cell of the 
system, and the receiver resides in the base station of the particular cell. 

A second type of multiuser communication system is a broadcast network in which a 
single transmitter sends information to multiple receivers, as depicted in Figure 16. 1-2. 
Examples of broadcast systems include the common radio and TV broadcast systems 
as well as the downlinks in cellular and satellite communication systems. 
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FIGURE 16.1-1 

A multiple access system. 

The multiple access and broadcast systems are the most common multiuser com- 
munication systems. A third type of multiuser system is a store-and-forward network, 
as depicted in Figure 16.1-3. Yet a fourth type is the two-way communication system 
shown in Figure 16.1-4. 

In this chapter, we focus on multiple access and broadcast methods for multiuser 
communications. In a multiple access system, there are several different ways in which 
multiple users can send information through the communication channel to the receiver. 
One simple method is to subdivide the available channel bandwidth into a number, say 
K, of frequency non-overlapping subchannels, as shown in Figure 16.1-5, and to assign 
a subchannel to each user upon request by the users. This method is generally called 
frequency-division multiple access (FDMA) and is commonly used in wireline channels 
to accommodate multiple users for voice and data transmission. 

Another method for creating multiple subchannels for multiple access is to subdi- 
vide the duration Tf, called the frame duration, into, say, K non-overlapping subin- 
tervals, each of duration Tf/K. Then each user who wishes to transmit information 



FIGURE 16.1-2 

A broadcast network. 


Ground stations 
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FIGURE 16.1-3 

A store-and-forward communication 
network with satellite relays. 


is assigned to a particular time slot within each frame. This multiple access method 
is called time-division multiple access (TDMA) and it is frequently used in data and 
digital voice transmission. 

We observe that in FDMA and TDMA, the channel is basically partitioned into 
independent single-user subchannels. In this sense, the communication system design 
methods that we have described for single-user communication are directly applicable 
and no new problems are encountered in a multiple access environment, except for the 
additional task of assigning users to available channels. 

The interesting problems arise when the data from the users accessing the network 
is bursty in nature. In other words, the information transmissions from a single user 
are separated by periods of no transmission, where these periods of silence may be 
greater than the periods of transmission. Such is the case generally with users at various 
terminals in a computer communication network. To some extent, this is also the case in 
mobile cellular communication systems carrying digitized voice, since speech signals 
typically contain long pauses. 

In such an environment where the transmission from the various users is bursty and 
low-duty-cycle, FDMA and TDMA tend to be inefficient because a certain percentage 
of the available frequency slots or time slots assigned to users do not carry informa- 
tion. Ultimately, an inefficiently designed multiple access system limits the number of 
simultaneous users of the channel. 

An alternative to FDMA and TDMA is to allow more than one user to share 
a channel or subchannel by use of direct- sequence spread spectrum signals. In this 



FIGURE 16.1-4 

A two-way communication channel. 
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FIGURE 16.1-5 

Subdivision of the channel into non-overlapping frequency bands. 

method, each user is assigned a unique code sequence or signature sequence that 
allows the user to spread the information signal across the assigned frequency band. 
Thus signals from the various users are separated at the receiver by cross correlation 
of the received signal with each of the possible user signature sequences. By designing 
these code sequences to have relatively small cross-correlations, the crosstalk inherent 
in the demodulation of the signals received from multiple transmitters is minimized. 
This multiple access method is called code division multiple access (CDMA). 

In CDMA, the users access the channel in a random manner. Hence, the signal 
transmissions among the multiple users completely overlap both in time and in fre- 
quency. The demodulation and separation of these signals at the receiver is facilitated 
by the fact that each signal is spread in frequency by the pseudorandom code sequence. 
CDMA is sometimes called spread spectrum multiple access (SSMA). 

An alternative to CDMA is nonspread random access. In such a case, when two 
users attempt to use the common channel simultaneously, their transmissions collide 
and interefere with each other. When that happens, the information is lost and must be 
retransmitted. To handle collisions, one must establish protocols for retransmission of 
messages that have collided. Protocols for scheduling the retransmission of collided 
messages are described below. 


It is interesting to compare FDMA, TDMA, and CDMA in terms of the information rate 
that each multiple access method achieves in an ideal AWGN channel of bandwidth W. 
Let us compare the capacity of K users, where each user has an average power P, = P, 
for all 1 < i < K . Recall that in an ideal band-limited AWGN channel of bandwidth 
W, the capacity of a single user is 


where \Nq is the power spectral density of the additive noise. 

In FDMA, each user is allocated a bandwidth W/K. Hence, the capacity of each 
user is 


■ 16.2 


CAPACITY OF MULTIPLE ACCESS METHODS 



( 16 . 2 - 1 ) 



( 16 . 2 - 2 ) 
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FIGURE 16.2-1 

Normalized capacity as a function of 
£ b /N 0 for FDMA. 


and the total capacity for the K users is 

( KP\ 
KC K = Wt og 2 (l + — ) 


(16.2-3) 


Therefore, the total capacity is equivalent to that of a single user with average power 
P av = KP. 

It is interesting to note that for a hxed bandwidth W, the total capacity goes 
to infinity as the number of users increases linearly with K . On the other hand, as 
K increases, each user is allocated a smaller bandwidth (W / K) and, consequently, 
the capacity per user decreases. Figure 16.2-1 illustrates the capacity Ck per user 
normalized by the channel bandwidth W, as a function of £b/No, with A - as a parameter. 
This expression is given as 


Ck 

~W 


Z log2 


1 + K — 

w 



(16.2-4) 


A more compact form of Equation 16.2-4 is obtained by defining the normalized total 
capacity C„ = K C K / W, which is the total bit rate for all K users per unit of bandwidth. 
Thus, Equation 16.2 — 4 may be expressed as 


C„ = log 2 (l+C„!) 

or, equivalently, 

Sb_ = 2 C " - 1 
-Wo C„ 


(16.2-5) 


(16.2-6) 


The graph of C„ versus £ b /No is shown in Figure 16.2-2. We observe that C„ increases 
as £b/ No increases above the minimum value of In 2. 

In a TDMA system, each user transmits for 1 / K of the time through the channel 
of bandwidth W, with average power K P. Therefore, the capacity per user is 


Wlog 2 1 + 


KP \ 

WNo) 


Ck = 


(16.2-7) 
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FIGURE 16.2-2 

Total capacity per hertz as a function 
of£ b /N 0 for FDMA. 


which is identical to the capacity of an FDMA system. However, from a practical stand- 
point, we should emphasize that, in TDMA, it may not be possible for the transmitters 
to sustain a transmitter power of K P when K is very large. Hence, there is a practical 
limit beyond which the transmitter power cannot be increased as K is increased. 

In a CDMA system, each user transmits a pseudorandom signal of a bandwidth W 
and average power P. The capacity of the system depends on the level of cooperation 
among the K users. At one extreme is noncooperative CDMA, in which the receiver for 
each user signal does not know the codes and spreading waveforms of the other users, 
or chooses to ignore them in the demodulation process. Hence, the other users’ signals 
appear as interference at the receiver of each user. In this case, the multiuser receiver 
consists of abankof K single-user matched biters. This is called single-user detection. If 
we assume that each user’s pseudorandom signal waveform is Gaussian, then each user 
signal is corrupted by Gaussian interference of power (K — l)P and additive Gaussian 
noise of power WNq. Therefore, the capacity per user for single-user detection is 


C K = Wlog 2 


P 

1 + 

WNo + (K 



(16.2-8) 


or, equivalently, 


cv _ [ C K £ b /N 0 

W ° g2 [ + W 1 + (K - 1 )(C K / W)(£ b /No) 


(16.2-9) 


Figure 16.2-3 illustrates the graph of Ck / W versus £ b /No, with A' as a parameter. 
For a large number of users, we may use the approximation ln(l + x) < x. Hence, 


Ck < Cg £ b /No j 

~W ~ "IT 1 + K(C k / W)(£ b / No) Sie 


or, equivalently, the normalized total capacity C„ = KCk/W is 

1 


C„ < logo e - 

1 


£b/N 0 

1 


1 


< 


In 2 £ b /No In 2 


(16.2-10) 


(16.2-11) 
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FIGURE 16.2-3 

Normalized capacity as a function of £b/No for noncooperative CDMA. 


In this case, we observe that the total capacity does not increase with K as in TDMA 
and FDMA. 

On the other hand, suppose that the K users cooperate by transmitting their coded 
signals synchronously in time, and the multiuser receiver jointly demodulates and 
decodes all the users’ signals. This is called multiuser detection and decoding. Each 
user is assigned a rate 1 < i < K . and a code book containing a set of 2 nR ‘ 
codewords of power P. In each signal interval, each user selects an arbitrary codeword, 
say X, , from its own code book, and all users transmit their codewords simultaneously. 
Thus, the decoder at the receiver observes 

K 

Y = J2 x t + Z (16.2-12) 

i=l 

where Z is an additive noise vector. The optimum decoder looks for the K codewords, 
one from each code book, that have a vector sum closest to the received vector Y in 
Euclidean distance. 

The achievable K -dimensional rate region for the K users in an AWGN channel, 
assuming equal power for each user, is given by the following equations: 

R, < Wlog 2 (^l + ^y 1 <i<K (16.2-13) 

Ri + Rj < w log, (l + , 1 <i,j <K (16.2-14) 

«„ = |><W10g 2 (l + |A) 


(16.2-15) 
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where ^sum i s the total (sum) rate achieved by the K users by employing multiuser 
detection. In the special case when all the rates are identical, the inequality 16.2-15 is 
dominant over the other K — 1 inequalities. It follows that if the rates {/?,-, 1 < i < K } 
for the K cooperative synchronous users are selected to fall in the capacity region 
specified by the inequalities given above, then the probabilities of error for the K users 
tend to zero as the code block length n tends to infinity. 

From the above discussion, we conclude that the sum of the rates of the K users 
^sum g ocs to infinity with K. Therefore, with coded synchronous transmission and 
joint detection and decoding, the capacity of CDMA has a form similar to that of 
FDMA and TDMA. Note that if all the rates in the CDMA system are selected to be 
identical to R, then Equation 16.2-15 reduces to 

W ( KP \ 

R < — log, 1 + (16.2-16) 

K B2 \ WNoJ 

which is the highest possible rate and is identical to the rate constraint in FDMA and 
TDMA. In this case, CDMA does not yield a higher rate than TDMA and FDMA. 
However, if the rates of the K users are selected to be unequal such that the inequalities 
16.2-13 to 16.2-15 are satisfied, then it is possible to find the points in the achievable 
rate region such that the sum of the rates for the K users in CDMA exceeds the capacity 
of FDMA and TDMA. 


example 16 . 2 - 1 . Consider the case of two users in a CDMA system that employs 
coded signals as described above. The rates of the two users must satisfy the inequalities 

R ' <w,og >( 1 + w ^) <162 - 17) 

R!<w ' 0 ^{ , + w^) <16 - 2 - 18) 

R { + R 2 <W log, (l + (16.2-19) 

where P is the average transmitted power of each user and W is the signal bandwidth. 

The capacity region for the two-user CDMA system with coded signal waveforms 
has the form illustrated in Figure 16.2 — 4, where 

C, = W log, ( 1 + — ) , i = l,2 

e2 V WNqJ 


are the capacities corresponding to the two users with Pi — P 2 = P . We note that if 
user 1 is transmitting at capacity C\, user 2 can transmit up to a maximum rate 


Rim = W log. 



-Ci 


= Wlog, 



f j 

p + WN 0 J 


(16.2-20) 


which is illustrated in Figure 16.2-4 as point A. This result has an interesting interpre- 
tation. We note that rate R 2m corresponds to the case in which the signal from user 1 is 
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FIGURE 16.2-4 

Capacity region of two-user CDMA multiple 
access Gaussian channel. 


considered as an equivalent additive noise in the detection of the signal of user 2. On the 
other hand, user 1 can transmit at capacity C i, since the receiver knows the transmitted 
signal from user 2 and, hence, it can eliminate its effect in detecting the signal of user 1 . 

Because of symmetry, a similar situation exists if user 2 is transmitting at capacity 
C 2 . Then user 1 can transmit up to a maximum rate R\ m = /tS„, , which is illustrated in 
Figure 16.2-4 as point B. In this case, we have a similar interpretation as above, with 
an interchange in the roles of user 1 and user 2. 

The points A and B are connected by a straight line, which is defined by Equa- 
tion 16.2-19. It is easily seen that this straight line is the boundary of the achievable 
rate region, since any point on the line corresponds to the maximum rate W log 2 
(1 + 2P/WNq), which can be obtained by simply time sharing the channel between 
the two users. 

In the next section, we consider the problem of signal detection for a multiuser 
CDMA system and assess the performance and the computational complexity of several 
receiver structures. 

■ 16.3 

MULTIUSER DETECTION IN CDMA SYSTEMS 

As we have observed, TDMA and FDMA are multiple access methods in which the 
channel is partitioned into independent, single-user subchannels, i.e., non-overlapping 
time slots or frequency bands, respectively. In CDMA, each user is assigned a distinct 
signature sequence (or waveform), which the user employs to modulate and spread 
the information-bearing signal. The signature sequences also allow the receiver to 
demodulate the message transmitted by multiple users of the channel, who transmit 
simultaneously and, generally, asynchronously. 

In this section, we treat the demodulation and detection of multiuser uncoded 
CDMA signals. We shall see that the optimum maximum-likelihood detector has a 
computational complexity that grows exponentially with the number of users. Such a 
high complexity serves as a motivation to devise suboptimum detectors having lower 
computational complexities. Finally, we consider the performance characteristics of 
the various detectors. 
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16.3-1 CDMA Signal and Channel Models 


Let us consider a CDMA channel that is shared by K simultaneous users. Each user is 
assigned a signature waveform g k (t) of duration T, where T is the symbol interval. A 
signature waveform may be expressed as 

L - 1 

gkit) = Y a k( n ) P (t ~ nT c ), 0 < t < T (16.3-1) 

>i=0 


where {a k (n), 0 < n < L — 1} is a pseudonoise (PN) code sequence consisting of L 
chips that take values {±1}, p(t ) is a pulse of duration T, , and T, is the chip interval. 
Thus, we have L chips per symbol and T = LT C . Without loss of generality, we assume 
that all K signature waveforms have unit energy, i.e., 

T g 2 k (t)dt = 1 (16.3-2) 

The cross correlations between pairs of signature waveforms play an important role 
in the metrics for the signal detector and on its performance. We define the following 
cross correlations, where 0 < r < T and i < j. 


Pij(r) = j gi(t)gj(t - r)dt (16.3-3) 

Pji(r)= [ gi(t)gj(t + T + x)dt (16.3-4) 

Jo 

The cross correlations in Equations 16.3-3 and 16.3-4 apply to asynchronous trans- 
missions among the K users. For synchronous transmission, we need only p,- 7 (0). 

For simplicity, we assume that binary antipodal signals are used to transmit the 
information from each user. Hence, let the information sequence of the /cth user be 
denoted by {bk(m)}, where the value of each information bit may be ± 1 . It is convenient 
to consider the transmission of a block of bits of some arbitrary length, say N. Then, 
the data block from the kth user is 


b k = [b k (l)---b k (N)Y (16.3-5) 

and the corresponding equivalent lowpass, transmitted waveform may be expressed as 

N 

s k {t) = sfS-k Y bk(i)gk(t ~ iT ) (16.3-6) 

i=i 

where £ k is the signal energy per bit. The composite transmitted signal for the K users 
may be expressed as 

K 

S(t) = Y Sk ^ ~ Xk> 
k= 1 

K N 

= Y s/^kY bk ^ gk ^ - iT - Xk> 

k= 1 >'=1 


(16.3-7) 
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where j r/, ) are the transmission delays, which satisfy the condition 0 < r/. < T for 
1 < k < K. Without loss of generality, we assume that 0 < t\ < %2 < ■ ■ ■ < t K < T . 
This is the model for the multiuser transmitted signal in an asynchronous mode. In the 
special case of synchronous transmission, r* = 0 for 1 < k < K. 

The transmitted signal is assumed to be corrupted by AWGN. Hence, the received 
signal may be expressed as 


where s(t) is given by Equation 16.3-7 and n(t ) is the noise, with power spectral 
density ^Nq. 


16.3-2 The Optimum Multiuser Receiver 

The optimum receiver is defined as the receiver that selects the most probable sequence 
of bits \b^(n), I < n < N, \ < k < K) given the received signal r(t) observed over 
the time interval 0 < t < NT + 2T. First, let us consider the case of synchronous 
transmission; later, we shall consider asynchronous transmission. 

Synchronous transmission In synchronous transmission, each (user) interferer 
produces exactly one symbol which interferes with the desired symbol. In additive 
white Gaussian noise, it is sufficient to consider the signal received in one signal 
interval, say 0 < t < T, and determine the optimum receiver. Hence, r(t) may be 
expressed as 


The optimum maximum-likelihood receiver computes the log-likelihood function 


and selects the information sequence {^(1), 1 < k < K] that minimizes A(h'). If we 
expand the integral in Equation 16.3-10, we obtain 


r(t) = s(t ) + n(t) 


(16.3-8) 


K 


r(t ) = J2 VZkbkWgkit) + n(t), 0 <t<T (16.3-9) 




k= 1 


(16.3-11) 



We observe that the integral involving r 2 (f) is common to all possible sequences { (1 ) } 
and is of no relevance in determining which sequence was transmitted. Hence, it may 
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be neglected. The term 


r k = [ T r{t)g k (t)dt, 1 <k<K (16.3-12) 

Jo 

represents the cross correlation of the received signal with each of the K signature 
sequences. Instead of cross correlators, we may employ matched filters. Finally, the 
integral involving g k (t) and gj(t) is simply 


Pjk( 0)= l T gj(t)g k (t)dt (16.3-13) 

Jo 

Therefore, Equation 16.3-1 1 may be expressed in the form of correlation metrics 


K K K 

c(r K , b K ) = 2J2 Vs k b k a> k -Y J Y,\f^kb k (i)b j {\)p J m (I6.3-14) 

k= 1 j= 1 k=l 

These correlation metrics may also be expressed in vector inner product form as 

C(r K , b K ) = 2b‘ K r K - b r K R s b K (16.3-15) 


where 


fK = [r 1 r 2 r K y, b K = [s/£\bi(Y) . . . s/£~^b K (l)]' 

and R s is the correlation matrix, with elements pj k (0). It is observed that the optimum 
detector must have knowledge of the received signal energies in order to compute the 
correlation metrics. Figure 16.3-1 depicts the optimum multiuser receiver. 

There are 2 K possible choices of the bits in the information sequence of the 
K users. The optimum detector computes the correlation metrics for each sequence 
and selects the sequence that yields the largest correlation metric. We observe that 
the optimum detector has a complexity that grows exponentially with the number of 
users, K. 

In summary, the optimum receiver for symbol-synchronous transmission consists 
of a bank of K correlators or matched filters followed by a detector that computes the 2 K 
correlation metrics given by Equation 16.3-15 corresponding to the 2 K possible trans- 
mitted information sequences. Then, the detector selects the sequence corresponding 
to the largest correlation metric. 

Asynchronous transmission In this case, there are exactly two consecutive sym- 
bols from each interferer that overlap a desired symbol. We assume that the receiver 
knows the received signal energies {£ k } for the K users and the transmission delays 
{ r^} . Clearly, these parameters must be measured at the receiver or provided to the 
receiver as side information by the users via some control channel. 
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FIGURE 16.3-1 

Optimum multiuser receiver for synchronous transmission. 


The optimum maximum-likelihood receiver computes the log-likelihood function 

2 


r NT+2T 


A (b) = 


N 


r NT+2T 


K K 


•(0 - Y. V£k bk(i)g k (t -iT - x k ) dt 

k= 1 (=1 

K N I-NT+2T 

(t)dt -2 V£k y^bkii) / r(t)gk(t - iT - r k )dt 

k= 1 /=! J ° 

•' N N I.NT+2T 

+EE V £k£i E E b k (i)bi(j) / g k (t -iT -r k )gi(t - jT -n)dt 

k = 1 /=1 1=1 ;=1 

( 16 . 3 - 16 ) 

where ft represents the data sequences from the K users. The integral involving r 2 (/) 
may be ignored, since it is common to all possible information sequences. The integral 


r k (0 




' iT+Zk 


r(t)g k (t — iT — r k )dt, 1 < i < N 


( 16 . 3 - 17 ) 
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represents the outputs of the correlator or matched filter for the Mil user in each of the 
signal intervals. Finally, the integral 


r NT+2T 


gk(t -iT - x k )gi(t - jT - ri)dt 


r NT+2T-iT-t k 


iT-r t 


gk(t)gl(t + H ~ jT + T* - ti )dt 


(16.3-18) 


may be easily decomposed into terms involving the cross correlation Pki(x) = Pki(x/ — 
Ti-) for k < 1 and Pik(x) for k > 1. Therefore, we observe that the log-likelihood 
function may be expressed in terms of a correlation metric that involves the outputs 
1 < k < K, < i < N] of K correlators or matched filters — one for each of the 
K signature sequences. Using vector notation, it can be shown that the NK correlator 
or matched filter outputs {/•/.(/)} can be expressed in the form 


r = R N b + n 


where, by definition 

r = [/-'(l) r\ 2) 

r(i) = [u(0 r 2 (i) 

b = [b\\) b\ 2) 


r'(N)] r 

r K (i)Y 

b\N)Y 


b(i) = [VS~ibi(i) V&biii) ■ ■ ■ '/£ K b K {i)Y 
n = [n\ 1) n'(2) • • ■ ri(N)}' 

n(i) = [m(i) n 2 (i ) ••• n K (i)V 


R c ,( 0 ) 

* a (l) 


Rn = 


0 

0 


Kw 

Ra( 0) 

0 

0 


0 

*a(D 

0 

0 


0 - 

0 0 

* fl (l) * fl (0) *'(1) 

0 R a ( 1) *„( 0). 


(16.3-19) 

(16.3-20) 

(16.3-21) 

(16.3-22) 

(16.3-23) 


and R a (m ) is a K x K matrix with elements 


Rki(m) = 



Tk)gi(t + mT 


r,)dt 


(16.3-24) 


The Gaussian noise vectors n(i) have zero-mean and autocoiTelation matrix 

E[n{k)n\j)\ = \N 0 R a (k - j) (16.3-25) 

Note that the vector r given by Equation 16.3-19 constitutes a set of sufficient statistics 
for estimating the transmitted bits bk(i). 

If we adopt a block processing approach, the optimum ML detector must com- 
pute 2 NK correlation metrics and select the K sequences of length N that correspond 
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to the largest correlation metric. Clearly, such an approach is much too complex com- 
putationally to be implemented in practice, especially when K and N are large. An 
alternative approach is ML sequence estimation employing the Viterbi algorithm. In 
order to construct a sequential-type detector, we make use of the fact that each trans- 
mitted symbol overlaps at most with 2 K — 2 symbols. Thus, a significant reduction in 
computational complexity is obtained with respect to the block size parameter N, but 
the exponential dependence on K cannot be reduced. 

It is apparent that the optimum ML receiver employing the Viterbi algorithm 
involves such a high computational complexity that its use in practice is limited to 
communication systems where the number of users is extremely small, e.g., K < 1 0. 
For larger values of K, one may consider a sequential-type detector that is akin 
to either the sequential decoding or the stack algorithms described in Chapter 8. 
Below, we consider a number of suboptimum detectors whose complexity grows lin- 
early with K. 


16.3-3 Suboptimum Detectors 

In the above discussion, we observed that the optimum detector for the K CDMA users 
has a computational complexity, measured in the number of arithmetic operations (ad- 
ditions and multiplications/divisions) per modulated symbol, that grows exponentially 
with K. In this subsection we describe suboptimum detectors with computational com- 
plexities that grow linearly with the number of users, K. We begin with the simplest 
suboptimum detector, which we call the conventional (single-user) detector. 

Conventional single-user detector In conventional single-user detection, the re- 
ceiver for each user consists of a demodulator that correlates (or match-filters) the 
received signal with the signature sequence of the user and passes the correlator output 
to the detector, which makes a decision based on the single correlator output. Thus, 
the conventional detector neglects the presence of the other users of the channel or, 
equivalently, assumes that the aggregate noise plus interference is white and Gaussian. 

Let us consider synchronous transmission. Then, the output of the correlator for 
the kth user for the signal in the interval 0 < t < T is 



y/S k b k (1) + J2 y/£j b jQ)Pjk{0) + n k { 1) (16.3-27) 


j = 1 

m 


where the noise component n^(l) is given as 



(16.3-28) 
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Since n(t) is white Gaussian noise with power spectral density { No, the variance of 
n k { 1) is 

E[n\{ 1)] = {N 0 [ T g 2 k (t)dt = {No (16.3-29) 

J o 

Clearly, if the signature sequences are orthogonal, the interference from the other users 
given by the middle term in Equation 1 6.3-27 vanishes and the conventional single-user 
detector is optimum. On the other hand, if one or more of the other signature sequences 
are not orthogonal to the user signature sequence, the interference from the other users 
can become excessive if the power levels of the signals (or the received signal energies) 
of one or more of the other users is sufficiently larger than the power level of the A' t h user. 
This situation is generally called the near-far problem in multiuser communications, 
and necessitates some type of power control for conventional detection. 

In asynchronous transmission, the conventional detector is more vulnerable to 
interference from other users. This is because it is not possible to design signature 
sequences for any pair of users that are orthogonal for all time offsets. Consequently, 
interference from other users is unavoidable in asynchronous transmission with the 
conventional single-user detection. In such a case, the near-far problem resulting from 
unequal power in the signals transmitted by the various users is particularly serious. 
The practical solution generally requires a power adjustment method that is controlled 
by the receiver via a separate communication channel that all users are continuously 
monitoring. Another option is to employ one of the multiuser detectors described below. 


Decorrelating detector We observe that the conventional detector has a complexity 
that grows linearly with the number of users, but its vulnerability to the near-far problem 
requires some type of power control. We shall now devise another type of detector that 
also has a linear computational complexity but does not exhibit the vulnerability to 
other-user interference. 

Let us first consider the case of symbol-synchronous transmission. In this case, the 
received signal vector r k that represents the output of the K matched filters is 


r k — R s I>k + n K (16.3-30) 

where bK = \\[£\b\(l) sfEobo (1) ■ • • V^x^xG)]' and the noise vector with ele- 
ments n k = ffii(l) « 2 ( 1 ) ■■■ «jr(l)] f has a covariance 

No 

E(n K n r K ) = -jR s (16.3-31) 

Since the noise is Gaussian, tk is described by a K -dimensional Gaussian PDF with 
mean R s bK and covariance R s . That is, 

p(r K \bK) = , exp 

\Z(Nott) k det R s 

(16.3-32) 

The best linear estimate of b° K is the value of bx that minimizes the likelihood function 
A (b K ) = (r K - R s b K yR;\r K - R s b K ) (16.3-33) 


-- r(r K - R s b K yR;\r K - R s b K ) 
N 0 


1044 


Digital Communications 


giM 



Sample 
a tt=T 


FIGURE 16.3-2 

Receiver structure for decorrelation receiver. 


The result of this minimization yields 

b° k = R;'r K (16.3-34) 

Then, the detected symbols are obtained by taking the sign of each element of b° K , i.e., 

b K = sgn (b° K ) (16.3-35) 


Figure 16.3-2 illustrates the receiver structure. Since the estimate b° K is obtained by 
performing a linear transformation on the vector of correlator outputs, the computational 
complexity is linear in K . 

The reader should observe that the best (maximum-likelihood) linear estimate of 
bK given by Equation 16.3-34 is different from the optimum non-linear ML sequence 
detector that finds the best discrete- valued {± 1 } sequence that maximizes the likelihood 
function. It is also interesting to note that the estimate b° K is the best linear estimate 
that maximizes the correlation metric given by Equation 16.3-15. 

An interesting interpretation of the detector that computes b° K as in Equa- 
tion 16.3-34 and makes decisions according to Equation 16.3-35 is obtained by con- 
sidering the case of K = 2 users. In this case. 


R 


s — 


1 

P 


P 

1 


(16.3-36) 


where 


r: 



-p 

l 


(16.3-37) 


P = 



g\(t)gi(t)dt 


(16.3-38) 
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Then, if we correlate the received signal 

r{t) = y/£ibigi(t) + s/Sibigiit) + n(t ) 


with gi(f) and g 2 (t), we obtain 


r 2 


\f£ \b\ + p\[£ 2 .b 2 + n i 
psfS[b\ + \fE 2 b 2 + «2 


(16.3-39) 


(16.3-40) 


where n\ and no are the noise components at the output of the correlators. Therefore, 


b° 2 = R7'r 2 

s/£~ibi + {n\ - pn 2 )/( 1 - p 2 ) 
_V£~ 2 b 2 + (n 2 ~ P»i)/(1 - P 2 ) 


(16.3^11) 


This is a very interesting result, because the transformation R~ ] has eliminated the 
interference components between the two users. Consequently, the near-far problem is 
eliminated and there is no need for power control. 

It is interesting to note that a result similar to Equation 16.3-41 is obtained if we 
correlate r(f) given by Equation 16.3-39 with two modified signature waveforms 

g\(t) = g\(t) - pg 2 (t) (16.3-42) 

g^t) = #2(0 - PgiiO (16.3-43) 

This means that, by correlating the received signal with the modified signature wave- 
forms, we have tuned out or decorrelated the multiuser interference. Hence, the detector 
based on Equation 16.3-34 is called a decorrelating detector. 

In asynchronous transmission, the received signal at the output of the correlators 
is given by Equation 16.3-19. Hence, the log-likelihood function is given as 

A (b) = (/• - R N byR^(r - R N b ) (16.3^14) 

where R N is defined by Equation 16.3-23 and b is given by Equation 16.3-21. It is 
relatively easy to show that the vector b that minimizes A (b) is 

b° = R N l r (16.3-45) 

This is the ML estimate of b and it is again obtained by performing a linear transfor- 
mation of the outputs from the bank of correlators of matched filters. 

Since r = R^b + n, it follows from Equation 16.3—4-5 that 

b Q = b + R- l n (16.3-46) 

Therefore, b° is an unbiased estimate of b. This means that the multiuser interference 
has been eliminated, as in the case of symbol-synchronous transmission. Hence, this 
detector for asynchronous transmission is also called a decorrelating detector. 

A computationally efficient method for obtaining the solution given by Equa- 
tion 16.3-45 is the square-root factorization method described in Appendix D. Of 
course, there are many other methods that may be used to invert the matrix R N . Iterative 
methods to decorrelate the signals have also been explored. 
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Minimum mean-square-error detector In the above discussion, we showed that 
the linear ML estimate of b is obtained by minimizing the quadratic log-likelihood 
function in Equation 16.3-44. Thus, we obtained the result given by Equation 16.3^45, 
which is an estimate derived by performing a linear transformation on the outputs of 
the bank of correlators or matched filters. 

Another, somewhat different, solution is obtained if we seek the linear transfor- 
mation b° = Ar, where the matrix A is to be determined so as to minimize the mean 
square error (MSE) 


J(b) = E[(b - b°)‘ib - b °j] 

= E[(b - Ar)\b - Ar)} 


(16.3-47) 


where the expectation is with respect to the data vector b and the additive noise n. The 
optimum matrix A may be found by forcing the error (b — Ar) to be orthogonal to the 
data vector r. Thus, 


E[{b - Ar)r'} = 0 
E(br') - AE(rr') = 0 


(16.3-48) 


and 


Let us consider the case of synchronous transmission. We have 

E(b K r‘ K ) = E(b K b' K )R' s = DR[ (16.3-49) 

E{r K r ’ K ) = E[(R s b K +n K ){R s b K +n K )‘ ] 

f N 0 , (16.3-50) 

= R s DR t s +^R‘ s 


where D is a diagonal matrix with diagonal elements {£/. , I < k < K). By substituting 
Equation 16.3-49 and 16.3-50 into Equation 16.3-48 and solving for A, we obtain 

A 0 = (V s + (16.3-51) 

Then, 


b° K = A°r K (16.3-52) 

and 

b K = sgn (b° K ) (16.3-53) 


Similarly, for asynchronous transmission, it can be shown that the optimum choice of 
A that minimizes J{b) is 

A°= {R n + \NoI)~ 1 (16.3-54) 

and, hence, 

b°= (Rv + ^Noiy'r 

A A 

The output of the detector is then b = sgn (b ). 


(16.3-55) 
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The estimate given by Equation 16.3-52 or 16.3-55 is called the minimum MSE 
(MMSE) estimate of b. Note that when j N {] is small compared with the diagonal 
elements of R N , the MMSE solution approaches the ML solution given by Equa- 
tion 16.3-45. On the other hand, when the noise level is large compared with the signal 
level in the diagonal elements of R^, A 0 approaches the identity matrix (scaled by 
J, No). In this low-SNR case, the detector basically ignores the interference from other 
users, because the additive noise is the dominant term. It should also be noted that 
the MMSE criterion produces a biased estimate of b. Hence, there is some residual 
multiuser interference. 

To perform the computations that lead to the values of b, we solve the set of linear 
equations 

(R N + \N 0 l)b = r (16.3-56) 

This solution may be computed efficiently using a square-root factorization of the matrix 
Rn + \ N()I as indicated above. Thus, to detect NK bits requires 3NK 2 multiplica- 
tions. Therefore, the computational complexity is 3 K multiplications per bit, which is 
independent of the block length N and is linear in K. 

We observe that both the decorrelating detector and the MMSE detector exhibit the 
desirable property of being near-far resistant. In fact, in the case of the decorrelating 
detector, the interference from other users is completely eliminated. 

We also observe that both the decorrelating detector and the MMSE detector de- 
scribed above involve performing linear transformations on a block of data obtained 
from K correlators or matched biters. The linear transformations are akin to the linear 
equalization of intersymbol interference treated in Chapter 9. In fact, the decorrelating 
detector is akin to the zero-forcing linear equalizer, and the MMSE detector is akin to 
the linear MMSE equalizer. Consequently, these multiuser detectors for asynchronous 
transmission can be implemented by employing a tapped-delay-line biter with ad- 
justable coefficients for each user and selecting the biter coefficients to either eliminate 
the interuser interference or to minimize the MSE for each user signal. Thus, the received 
information bits are estimated sequentially with bnite delay, instead of as a block. 

A decision-feedback-type biter can be used instead of a linear biter to implement 
the multiuser detector that processes the data sequentially. In particular, Xie et al. 
(1990b) demonstrated that the transmitted bits may be recovered sequentially from 
the received signal by employing a form of a decision-feedback equalizer with bnite 
delay. Hence, there is a similarity between the detection of signals corrupted by ISI in 
a single-user communication system and the detection of signals in a multiuser system 
with asynchronous transmission. 


16.3-4 Successive Interference Cancellation 

Another multiuser detection technique is called successive interference cancellation 
(SIC). This technique is based on removing the interfering signal waveforms from 
the received signal, one at a time as they are detected. One approach is to demodulate 
the users in the order of decreasing received powers. Thus, the user having the strongest 
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received signal is demodulated first. After a signal has been demodulated and detected, 
the detected information is used to subtract the signal of the particular user from the 
received signal. 

When making a decision about the transmitted information of the Arth user, we 
assume that the decisions of users k + 1, . . . , K are correct and neglect the presence of 
users 1 , ,k — 1. Therefore, the decision for the information bit of the kth user, for 
synchronous transmission, is 


where r^ is the output of the correlator or matched filter corresponding to the klh user’s 
signature sequence. 

The approach based on demodulating the user signals in the order of decreasing 
received powers does not take into account the cross correlations among users. An 
alternative approach is to demodulate the user signals according to the powers at the 
outputs of the cross correlators or matched biters, i.e., according to the correlation 
metrics 


which applies to the case of synchronous transmission. 

We make the following observations regarding the SIC of multiuser interference. 
First of all, SIC requires that we estimate the received signal powers of the users in order 
to cancel the interference. Estimation errors result in residual multiuser interference, 
which causes a degradation in performance. Secondly, the interference from users whose 
signals are weaker than the user signal being detected is treated as additive interference. 
Thirdly, the computational complexity in the demodulation of a user information bit 
is linear in the number of users. Finally, the delay in demodulating the weakest user 
increases linearly with the number of users. 

SIC is easily generalized to asynchronous signal transmission. In this case, both 
the user signal strengths and the time delays must be estimated. 

Finally, we note that the SIC multiuser detector given in Equation 16.3-57 is 
also a suboptimum detector, since the signals of weaker users are treated as additive 
interference. The jointly optimum interference canceller for synchronous transmission 
may be debned as the detector which computes the decisions hi- as 


Multistage interference cancellation (MIC) Multiuser detection based on MIC is 
a technique that employs multiple iterations in detecting the user bits and cancelling 
the interference. The method is easily described by means of an example. 



(16.3-57) 



(16.3-58) 



(16.3-59) 
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EXAMPLE 16.3-1. TWO USERS AND SYNCHRONOUS TRANSMISSION. For the first Stage 
of the detector, we may use the SIC detector or any of the suboptimum detectors. For 
example, suppose we use the decorrelating detector in the first stage. 


First stage (decorrelating detector): 


Second stage: 


Third stage: 


b\ = 

sgn(n 

- pr 2 ) 

bi = 

sgn(r 2 

- pr\) 

= sgn 

(- 

VSibiP^j 

= sgn 

(r 2 ~ 

y/~£\bip^j 

= gn 1 

(n - 

sfSibop^j 

= sgn 

U- 

y/~£\b\p\ 


The computations may be terminated when there is no change in the decisions over 
two successive iterations. 


Successive interference cancellation and multistage interference cancellation are 
two types of multiple access interference cancellation techniques that have received 
considerable attention by many researchers. For reference, we include the papers by 
Varanasi and Aazhang (1990), Patel and Holtzman (1994), Buehrer et al. (1996, 1999), 
and Divsalar et al. (1998). 

We should indicate that the MIC is a suboptimum detector and does not converge 
to the jointly optimum multiuser detector defined above. 


16.3-5 Other Types of Multiuser Detectors 

Because of the widespread interest in the development of commercial CDMA commu- 
nication systems, the design of multiuser detection algorithms continues to be a very 
active area of research. Our treatment in this chapter focused on the optimum MLSE 
algorithm, suboptimum linear (MMSE and decorrelating detection) algorithms, and 
non-linear successive interference cancellation algorithms based on hard decisions. 

In addition to these relatively simple algorithms, a number of more complex al- 
gorithms have been described in the literature that are appropriate for time-dispersive 
channels which result in 1ST In addition, one may assume that knowledge of the sig- 
nature waveforms of the other users is not available to a user receiver. Hence, a user 
receiver is confronted with both ISI and multiple access interference (MAI). In such a 
scenario, it is possible to design adaptive interference suppression algorithms that are 
akin to equalization algorithms previously described in Chapter 10. 

Adapative algorithms for suppressing ISI and MAI in multiuser CDMA systems 
are described in the papers by Abdulrahman et al. (1994), Honig (1998), Miller (1995, 
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1996), Rapajic and Vucetic (1994), and Mitra and Poor (1995). In some cases, the 
adaptive algorithms are designed to converge without the use of any training symbols. 
Such algorithms are called blind multiuser detection algorithms. Examples of such 
blind algorithms are described in the papers by Honig et al. (1995), Madhow (1998), 
Wang and Poor (1998a, b), Bensley and Aazhang (1996) and the book by Wang and 
Poor (2004). 

The use of multiple transmitting and/or receiving antennas in CDMA systems pro- 
vides each user with the opportunity to employ spatial filtering in addition to temporal 
filtering to reduce ISI and MAI and combat signal fading. Blind multiuser detection 
algorithms for multiple antenna systems have been described by Wang and Poor (1999). 

In general, the signals transmitted by the various users in a CDMA communication 
system are coded, either using a single level of coding or a concatenated code. In- 
stead of separating the signal processing of the demodulator from the decoder, a better 
strategy is to use soft-information metrics from the decoder to enhance the suppres- 
sion of the MAI and ISI at the demodulator. Thus, one can devise turbo-type iterative 
demodulation-decoding algorithms for suppressing MAI and ISI. Such algorithms for 
coded CDMA systems have been described in the papers by Reed et al. (1998), Moher 
(1998), Alexander et al. (1999), and Wang and Poor (1999). 


16.3-6 Performance Characteristics of Detectors 

The bit error probability is generally the desirable performance measure in multiuser 
communications. In evaluating the effect of multiuser interference on the performance 
of the detector for a single user, we may use as a benchmark the probability of a bit 
error for a single-user receiver in the absence of other users of the channel, which is 

Pk(Yk) = Q(V^n) (16.3-60) 

where yt = £k /No, <£) is the signal energy per bit, and ^ No is the power spectral density 
of the AWGN. 

In the case of the optimum detector for either synchronous or asynchronous trans- 
mission, the probability of error is extremely difficult and tedious to evaluate. In this 
case, we may use Equation 16.3-60 as a lower bound and the performance of a subop- 
timum detector as an upper bound. 

Let us consider, first, the suboptimum, conventional single-user detector. For syn- 
chronous transmission, the output of the correlator for the kth user is given by Equa- 
tion 16.3-27. Therefore, the probability of error for the Arth user, conditional on a 
sequence ft, of bits from other users, is 


/ 


K 

2 , i 


2 

^£k + y t s! E jb j{\)pjk(0) 

/ No 

\\ 


7=1 

L J 

' J 


Pkibi) = Q 


(16.3-61) 
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Then, the average probability of error is simply 

2 k-i 

Pk = Ci) K ~ l T, (16.3-62) 

1 = 1 


The probability in Equation 16.3-62 will be dominated by the term that has the smallest 
argument in the Q function. The smallest argument will result in an SNR of 


Vs k - ^j\Pjk(0)\ 

7=1 

m A 

Therefore, 

Q ) K ~ l 2(\/2(SNR) min ) < P k < <2(\/2(SNR) min ) 


(SNR) mm = 

No 


(16.3-63) 


(16.3-64) 


A similar development can be used to obtain bounds on the performance for asyn- 
chronous transmission. 

In the case of a decorrelating detector, the other-user interference is completely 
eliminated. Hence, the probability of error may be expressed as 


P k = Q 



(16.3-65) 


where oy 2 is the variance of the noise in the ft It element of the estimate b°. 

example 16 . 3 - 2 . Consider the case of synchronous, two-user transmission, where b 2 
is given by Equation 16.3-41. Let us determine the probability of error. 

The signal component for the first term in Equation 16.3-41 is yf£\. The noise 
component is 


n i - pn 2 



where p is the correlation between the two signature signals. The variance of this 
noise is 


and 


2 _ £[(« l - pni )] 2 
(1 - P 2 ) 2 
1 N 0 

~ 1 - p 2 2 


(16.3-66) 


P\ = Q 



(l -P 2 ) 


(16.3-67) 


A similar result is obtained for the performance of the second user. Therefore, the noise 
variance has increased by the factor (1 — p 2 y 1 . This noise enhancement is the price 
paid for the elimination of the multiuser interference by the decorrelating detector. 
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The error rate performance of the MMSE detector is similar to that for the decor- 
relating detector when the noise level is low. For example, from Equation 16.3-55, we 
observe that when Nq is small relative to the diagonal elements of the signal correlation 
matrix R N , 

b° « R N l r (16.3-68) 

which is the solution for the decorrelating detector. For low multiuser interference, the 
MMSE detector results in a smaller noise enhancement compared with the decorrelating 
detector, but has some residual bias resulting from the other users. Thus, the MMSE 
detector attempts to strike a balance between the residual interference and the noise 
enhancement. 

An alternative to the error probability as a figure of merit that has been used to 
characterize the performance of a multiuser communication system is the ratio of SNRs 
with and without the presence of interference. In particular, Equation 16.3-60 gives 
the error probability of the kth user in the absence of other-user interference. In this 
case, the SNR is y* = £k/No. In the presence of multiuser interference, the user that 
transmits a signal with energy £k will have an error probability Ft that exceeds PffYk)- 
The effective SNR yk e is defined as the SNR required to achieve the error probability 

Pk = Pk(Yke) = Q(V^ne) (16.3-69) 

The efficiency is defined as the ratio Yke/Yk and represents the performance loss due 
to the multiuser interference. The desirable figure of merit is the asymptotic efficiency, 
defined as 

rj k = lim — (16.3-70) 

iVo— >0 Yk 

This figure of merit is often simpler to compute than the probability of error. 

example 16.3-3. Consider the case of two symbol-synchronous users with signal 
energies £\ and £ 2 - Let us determine the asymptotic efficiency of the conventional 
detector. 

In this case, the probability of error is easily obtained from Equation 16.3-61 and 
Equation 16.3-62 as 

Pi = \Q + + \Q ^2(v^-pVf 2 ) 2 /No) 

However, the asymptotic efficiency is much easier to compute. It follows from the 
definition of Equation 16.3-70 and from Equation 16.3-61 that 



A similar expression is obtained for i ] 2 . 

The asymptotic efficiency of the optimum and suboptimum detectors that we have 
described has been evaluated by Verdu (1986c), Lupas and Verdu (1989), and Xie et al. 
(1990b). Figure 16.3-3 illustrates the asymptotic efficiencies of these detectors when 
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FIGURE 16.3-3 

Asymptotic efficiencies of optimum (Viterbi) detector, conventional detector, MMSE detector, 
and linear ML detector in a two-user synchronous DS/SSMA system. [From Xie et al. (1990 b), 
© IEEE.] 


K = 2 users are transmitting synchronously. These graphs show that when the inter- 
ference is small ( 82 —> 0), the asymptotic efficiencies of these detectors are relatively 
large (near unity) and comparable. As £2 increases, the asymptotic efficiency of the 
conventional single-user detector deteriorates rapidly. However, the other linear detec- 
tors perform relatively well compared with the optimum detector. Similar conclusions 
are reached by computing the error probabilities, but these computations are often more 
tedious. 


■ 16.4 

MULTIUSER MIMO SYSTEMS FOR BROADCAST CHANNELS 

In the previous section we treated the detection of signals transmitted simultaneously 
by multiple users to a common receiver. This scenario applies, for example, to the 
uplink of a cellular communication system in which the individual users transmit to a 
base station. We observed that the base station has the choice of selecting one of several 
multiuser detection methods to separate and recover the data transmitted by each of the 
multiple users. 

In this section, we consider a broadcast scenario where data are transmitted simulta- 
neously to multiple users from a common transmitting site. The transmitter is assumed 
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to employ Nj antennas to transmit the data to K geographically distributed receivers, 
where Nj > K. Each user is assumed to have a receiver with one or more receiving 
antennas. This scenario applies, for example, to the downlink (broadcast mode) of a 
wireless local-area network (LAN) or a cellular communication system in which the 
channel is a MIMO channel. The distinguishing feature of this MIMO broadcast system 
is that the receivers are geographically distributed (point-to-multipoint transmission) 
and employ no coordination in processing the received signals. In contrast, the point- 
to-point MIMO systems that were treated in Chapter 15 exploited the availability of 
the signals from all the antennas in detecting the data. 

In the MIMO broadcast scenario considered in this section, there are two possible 
approaches for dealing with the multiple-access interference (MAI) resulting from the 
simultaneous transmission to multiple users. One approach is to have each receiver 
employ interference mitigation in the recovery of its desired signal. In most cases, 
this approach is impractical because the users lack the processing capabilities and are 
constrained by the limited energy resources inherent in the use of battery power. The 
alternative approach is to employ interference mitigation techniques at the base station, 
which possesses significantly greater processing capabilities and energy resources. We 
adopt this more practical approach to interference mitigation for the MIMO broadcast 
channel. 

MAI mitigation at the base station requires that the transmitter know the channel 
characteristics, typically the channel impulse response. This channel state information 
(CSI) may be obtained by channel measurements performed at each of the receivers by 
means of received pilot signals transmitted by the base station. Then the CSI must be 
transmitted to the base station for use in MAI mitigation. In some systems, the uplink 
and downlink channels are identical, e.g., the same frequency band is employed for 
both the uplink and downlink, but separate time slots are used for transmission. This 
transmission mode is called time-division duplex (TDD). In TDD operation, the pilot 
signals for channel measurement may be transmitted by each of the users in the uplink. 
In any case, we assume that the channel time variations are relatively slow so that a 
reliable estimate of the channel characteristics is available at the base station. In the 
treatment given in this section, we assume that the CSI at the transmitter is perfect. 

The suppression of MAI by means of transmitter processing is usually called signal 
precoding. Although we will not include coded signal transmission in this discussion of 
MAI suppression, the addition of channel coding to achieve a rate near channel capacity 
is essential. In a paper entitled “Writing on Dirty Paper,” Costa (1983) demonstrated 
that the capacity of an additive Gaussian noise channel further corrupted by additive 
interference that is known at the transmitter is the same as the capacity of the additive 
Gaussian noise channel without the additional interference. The analogy to writing on 
dirty paper is that if the writer (transmitter) knows where the dirt is located on the paper, 
the message can be written in a way that the reader (receiver) can recover the message 
without any knowledge of the location of the dirt. To elaborate, suppose the transmitter 
first selects a codeword xj, to be transmitted to receiver 1. Then the transmitter selects 
a codeword X 2 to be transmitted to receiver 2, with knowledge of the codeword xj to 
be sent to receiver 1. In such a case, the transmitter can presubtract xi from X 2 , so that 
receiver 2 will receive X 2 without interference. The signal precoding performed at the 
transmitter to suppress MAI is sometimes called dirty paper precoding. 
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FIGURE 16.4-1 

Model of MIMO broadcast system employing linear precoding. 

Signal precoding at the transmitter may take one of several forms, depending on the 
criterion or the method used to perform the precoding. The simplest precoding methods 
are linear and are based on either the zero-forcing (ZF) criterion or the mean-square- 
error (MSE) criterion. Alternatively, there are nonlinear signal precoding methods that 
result in better system performance. We begin with a treatment of linear precoding and 
then we describe three nonlinear precoding methods. 


16.4-1 Linear Precoding of the Transmitted Signals 

For convenience and mathematical simplicity, we assume that each user has a single 
antenna and the number of receivers (users) is K < Nj. It is also convenient to assume 
that the channel is nondispersive. The communication system configuration is shown 
in Figure 16.4-1, where the precoding matrix is denoted as Aj- Hence, the received 
signal vector is 

y = HA T s + ri (16.4-1) 

where H is a K x N r matrix, A r is an N r x K matrix, s is a K x 1 vector, and t] is 
a A" x 1 Gaussian noise vector. The matrix that eliminates the MAI at each receiver is 
generally given by the Moore-Penrose pseudoinverse (see Appendix A) 

H + = H h (HH h r 1 (16.4-2) 

Hence, the precoding matrix is 

A t = aH + (16.4-3) 

where o' is a scale factor that is selected to satisfy the total transmitted power allo- 
cation, i.e., |A-/-.v|| 2 = P. Thus, the precoding matrix in Equation 16.4-3 allows the 
individual users to recover their desired symbols without any interference from the 
signals transmitted to the other users. We also observe that in the special case where 
K = Nt, A t = aH~ l . Furthermore, we note that when the symbols transmitted to 
the K users are selected from the same constellation, all users have the same SNR at 
their receivers and the corresponding data rates are also identical. 

The sum capacity of the MIMO broadcast system that employs a channel inversion 
precoder has been investigated by Hochwald and Vishwanath (2005) and by Peel et 
al. (2005). It is shown in these references that the ergodic sum capacity with channel 
inversion, when K = Nt — »• oo, approaches a constant independent of K and Nj. 
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This result is in contrast to the achievable sum capacity of a MIMO system which, as 
we have observed, increases linearly as mini Ay, K). This poor performance resulting 
from channel inversion is attributed to the large disparity between the smallest and 
largest eigenvalues of the matrix ( HH h )~ l . 

The effect of the ill-conditioning in the channel matrix H is also observed in the 
error rate performance of the MIMO broadcast system that employs channel inversion 
to suppress the MAI. This ill-conditioning requires an increase in transmit power to 
attain acceptable performance. The error rate performance is illustrated in the following 
example. 

example 16 . 4 - 1 . The broadcast system modeled by Equations 16.4-1 and 16.4-3 
may be simulated on a computer. The channel matrix elements are complex- valued iid 
zero-mean Gaussian random variables with unit variance. The error rate performance 
of the zero-forcing precoder obtained via Monte Carlo simulation is illustrated in 
Figure 16.4-2 for K = Nt = 4, 6, and 10 for QPSK modulation. We observe that 
the error rate increases with an increase in the number of users. We attribute this 
deterioration in performance to the ill-conditioning of the channel matrix H. 

As we have observed, the major drawback with the zero-forcing solution is that 
when the channel matrix H is ill-conditioned (low gains or high attenuation in some 
of the transmitter-receiver links), the system performance is degraded, due to matrix 
inversion. If we relax the condition that the MAI be zero at all the receivers, the 
performance degradation can be reduced. This can be accomplished by using the linear 
MSE criterion in the design of the precoding matrix A j . Thus, we select A-/ to minimize 



FIGURE 16.4-2 

Performance of ZF linear precoding with N T = K = 4, 6, 10. Performance improves as K 
decreases. 
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FIGURE 16.4-3 

Comparison of the sum capacity for the linear precoder as a function of the number of users 
K(K = N t ) for an SNR = 10 dB. [From Peel et al. (2005). © IEEE.] 


the cost function 


J(At, a) = arg min E 

a,A r 


— ( HA t s + r\) — s 
a 


(16.4-4) 


subject to the transmitted power allocation || A r s || 2 = P, and where the expectation in 
Equation 16.4-4 is taken over the noise statistics and signal statistics. The solution to 
the MMSE criterion is the precoding matrix 

A t = uH h (HH h + 0I)~ l (16.4-5) 


where a is the scale factor that is selected to satisfy the power allocation and /3 is 
defined as a loading factor, which when selected as ft = K/P maximizes the signal - 
to-interference-plus-noise ratio (SINR) at the receiver [see Peel et al. (2005)]. 

Figure 16.4-3, taken from the paper by Peel et al. (2005), provides a comparison of 
the sum capacity for the two linear precoders based on the zero-forcing and the MMSE 
criteria. Also shown in this figure is the ergodic sum capacity of the MIMO channel 
when the channel characteristics are known at the transmitter. We observe that the sum 
capacity of the linear precoder designed on the basis of the MMSE criterion increases 
linearly with K, but it has a smaller slope than the theoretical limit. 

The error rate performance of the MMSE linear precoder obtained by Monte Carlo 
simulation in a frequency-nonselective Rayleigh fading channel is illustrated in Fig- 
ure 16.4-4 for K = Nj = 4, 6, and 10. We observe that the error rate performance 
improves slightly as the number of users K increases. 
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FIGURE 16.4-4 

Performance of MMSE linear precoding with N T = K = 4, 6, 10. Performance improves as K 
increases. 


16.4-2 Nonlinear Precoding of the Transmitted Signals — The QR 
Decomposition 

When the transmitter knows the interference caused on other users by the transmis- 
sion of a signal to any particular user, the transmitter can design signals for each 
of the other users to cancel the MAI. The major problem with such an approach 
is to perform the interference cancellation without increasing the transmitter power. 
We encountered this same issue in our treatment of channel equalization based on 
decision-feedback equalization, where the feedback part of the equalizer was imple- 
mented at the transmitter (see Section 9.5^4). We recall that when the range of the 
difference between the desired symbol and the ISI exceeded the range of the desired 
transmitted symbol, the difference was reduced by subtracting an integer multiple of 
2 M for M - ary PAM, where [-M. M) is the range of the desired transmitted sig- 
nal. This same nonlinear precoding method, called Tomlinson-Harashima precoding, 
can be applied to the cancellation of the MAI in a MIMO broadcast communication 
system. 

Figure 16.4-5 illustrates the precoding operations for the MIMO multiuser sys- 
tem. For a frequency-selective channel, the channel impulse response between the ith 
transmit antenna and the receive antenna of the /:th user is modeled as 

L - 1 

h ki (t) = Y J h ( [l8(t-lT) 

1=0 


(16.4-6) 
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FIGURE 16.4-5 

Tomlinson-Harashima precoding applied to a MIMO system. 


where L is the number of multipath components in the channel response, T is the 
symbol duration, and h ( [- is the complex-valued channel coefficient for the /th path. 
The channel coefficients {h^j} are known at the transmitter and are realizations of iid 
zero-mean, circularly symmetric complex Gaussian random variables with variance 


E 



1 

T 


Wk, i, and / 


(16.4-7) 


It is convenient to arrange these channel coefficients for the /th path in a K x Nt matrix 
H°\ where [H m ] ki = h$, i = 1, 2, . . . , N T , k = 1, 2, . . . , K. 

The MAI cancellation is facilitated by use of the QR decomposition of the channel 
matrix H (0 \ Thus, we express \H {[]) \ n as 

[H {0) \" = QR (16.4-8) 


where Q is an Nt x K matrix, such that QQ H = /, and R is a K x K upper triangular 
matrix with diagonal elements Based on this decomposition of [ H <i)] \ n . the signal 
to be transmitted is precoded with the matrix transformation 


W = QA 


(16.4-9) 


where A is a A' x K diagonal matrix with diagonal elements I / r,, , i = 1, 2, . . . , K. 
The {r,, } are real and positive [see Tulino and Verdu (2004)]. The matrix P = pi is 
a diagonal K x K matrix that is used simply for scaling the power of the transmitted 
signal and results in equal SNR for all users. Therefore, we have an effective channel 
matrix of the form 


H {0) WP = [QR] H QAP 
= pR H A 


(16.4-10) 


We note that R H A is a A" x K lower triangular matrix with unit diagonal elements. As 
a result, user k sees multiple access interference from users 1, 2, . . . , k — 1. We also 
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note that the effective channel matrix // <0) W = R H A will have full rank K, provided 
that Nt > K. 

By reducing this channel matrix to a lower triangular matrix, we can now subtract 
the interference at the transmitter that each user would normally observe at his or 
her respective receiver. Thus, when the channel adds the same interference to the 
transmitted signal, the received signal at each receiver will be free of interference. 
By taking advantage of the lower triangular matrix structure, successive interference 
cancellation is performed with the feedback filter defined by the matrix 

B = [I — H (0) W, -H (l) W, - H a) W , . . . , W] (16.4-1 1) 

where the matrix ( I — H <0> W) is used to cancel the interference due to the other users that 
arises in the current symbol interval, and the terms —H {l) W, — H ( - 2 ' > W, . . . , —H^ L ~ l) W 
are used to cancel the interference due to previous symbols. 

To ensure that the subtraction of the interference terms does not result in an in- 
crease of transmitter power, we use the modulo operator, as in Tomlinson-Harashima 
precoding, to limit the range of the signal to the boundaries of the signal constellation. 
Thus, the output of the modulo operators for the nth symbol vector, as shown in Figure 
16.4-5, is (for square QAM constellations) 


x(n) = mod 2v ^[s(«) + Bx(n)\ 

= s(n) + Bx(n) — 2\fMz, x {n) 


(16.4-12) 


where the modulo operation is performed on each real and imaginary component of the 
vector [s(n) + Bx(n )], x(n) is the K x I vector at the output of the modulo operator, 
s(ri) is the K x 1 data vector, x(n) is defined as 

x(n) = [x(n)‘ , x (n — 1)', x(n — 2)', . . . , x(n — (L — I))']' (16.4-13) 


and z x (n) is an K x I vector with complex- valued components that take on inte- 
ger values, determined by the constraint that the real and imaginary components of 
x(n) fall in the range of \f~M ). Therefore, the transmitted signal vector is 

expressed as 


s'(n) = WPx(n) 
= pW x(n) 


(16.4-14) 


and the received signal vector is 


L—l 

r(n) = H (,) Wx{n — i ) + r](n) (16.4-15) 

(=0 


Hence, 


L—l 

P- I r(n ) = x(n ) + (H (0) W - I)x(n) + ]T H (i) WX(n - i ) + r)'(n) (16.4-16) 

1 = 1 
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By substituting for B and jt(/z) in Equation 16.4-16, it follows that 

P~ l r(n) = s(n ) + ) ]'(n) - 2+fMzM) (16.4-17) 

Consequently, the MAI and ISI canceled perfectly, resulting in the test statistics for the 
nth symbol vector as 


y(n) = mod 2 ^ 


1 

-r(n) 

IP 


(16.4-18) 


Optimum Ordering of the Decentralized Receivers 

The ordering of the K decentralized receivers affects the construction of the K x N- r 
channel matrix H (0) . There are K ! possible column permutations of [ H l{]) \" , and hence 
there is one QR decomposition associated with each permutation. In turn, there are K ! 
transformation matrices W = QA, each of which requires a different transmit power. 
To minimize the total transmit power, it is necessary to search over all the column 
permutations of Such an exhaustive search procedure is computationally time- 

consuming, except for a small number of users. Foschini et al. (1999) have described 
methods for simplifying the search for the optimum ordering. 

The error rate performance of the QR decomposition method described above has 
been evaluated by Amihood et al. (2006, 2007). Figure 16.4—6 illustrates the symbol 
error probability as a function of the SNR (total transmitted signal power over all 
antennas divided by No) for QPSK modulation, L = 1,2 and N r = K = 2. The 



FIGURE 16.4-6 

Performance of optimal QR decomposition with N T = K = 2 and L = 1 and 2. 
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FIGURE 16.4-7 

Performance of optimal ordered QR decomposition with K = 2, L = 1 and Nj = 2, 3, and 4. 

Monte Carlo simulation results are also illustrated. The simulation results are obtained 
by transmitting 1000 data symbols over each of 10,000 channel realizations. 

Figure 16.4-7 shows the symbol error rate performance for QPSK with L = 1 (flat 
fading), K = 2, and Nt = 2, 3, 4. We observe that the system performance improves 
with an increase in the number of transmit antennas, which reflects the benefit of spatial 
diversity. 

Figure 16.4-8 shows a comparison of the error rate performance of the linear 
ZF and MMSE precoding methods with the QR decomposition method for QPSK 
modulation with L = 1 and K = Nt = 4. Figure 16.4-9 shows a similar comparison 
for K = Nt = 6. We observe that the performance of the QR decomposition method is 
better than that of the linear precoders at high SNRs but poorer at low SNRs. However, 
the improvement in performance of the QR decomposition method at high SNRs should 
be weighed against the significantly higher computational complexity compared with 
the linear MMSE precoder. 


16.4-3 Nonlinear Vector Precoding 

The QR decomposition method described in Section 16.4-2 is one of several nonlinear 
precoding techniques described in the literature for suppressing MAI in MIMO broad- 
cast communication systems. These methods may be generally described as vector 
precoding techniques. 

Hochwald et al. (2005) have proposed and evaluated the performance of a vector 
precoding technique in which the data vector to be transmitted to the K users is modified 
by the addition of a precoding vector with integer elements. In particular, let us consider 
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FIGURE 16.4-8 

Comparison of the QR decomposition and the linear precoders with Nj = K = 4. 

a modification of the linear zero-forcing precoder in which each element of the data 
vectors is offset by some judiciously selected integer, as illustrated in Figure 16.4-10. 
Thus, the offset data vector becomes 

s' = s + rp (16.4-19) 



FIGURE 16.4-9 

Comparison of the QR decomposition and the linear precoders with N T = K = 6. 
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FIGURE 16.4-10 

Model of MIMO broadcast system employing vector precoding. 


where r is areal positive number and p is a A" -dimensional vector with complex- valued 
elements, where the real and imaginary components are integers. Hence, for N T = K, 
the transmitted signal vector is 


x = A t (s + r p) 

= aH~ l (s + r p) 


(16.4-20) 


The offset vector p is chosen to minimize the power in the transmitted signal, i.e., 
p = argmin || aH~ l (s + rp) || 2 (16.4-21) 

p 

Hence, the vector perturbation method jointly optimizes the perturbation vector for 
the signals that are transmitted to all the receivers. Algorithms for solving this least- 
squares AT -dimensional integer-lattice problem are given in the paper by Hochwald 
et al. (2005). 

It is demonstrated in Hochwald et al. (2005) that the optimization of the perturba- 
tion vector p results in an offset data vector s' that, on average, is oriented toward each 
eigenvalue of ( HH h )~ 1 in inverse proportion to the eigenvalue. This vector precod- 
ing method generally yields better error rate performance than the QR decomposition 
method, described in the previous section, that employs scalar Tomlinson-Harashima 
precoding. 

The perturbation vector p is not known to the receivers. However, by constraining 
the elements of p to be integers, the receivers may use the modulo operation, as in 
Tomlinson-Harashima precoding, to recover the data components. The scalar r is 
selected large enough that each receiver applies the modulo function to the real and 
imaginary components of each element of the received vector y = Hx + rj to recover 
the corresponding element of the data vector s. It is desirable to choose r so that it 
results in a symmetric decoding region around the real and imaginary components 
of every signal constellation symbol. The choice of r that accomplishes this desired 
goal is 


r = 2[s J t| max + A (16.4-22) 

where |sjt| m ax is the signal constellation symbol having the largest magnitude and A is 
the distance between adjacent constellation symbols. 
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The vector perturbation technique may also be applied to the linear precoder based 
on the MMSE criterion. In this case, the transmitted vector is 


x = A t (s + r p) 

= oiH h (HH h + pi)~\s + rp) 


(16.4-23) 


where p is selected to minimize the power of the transmitted signal, i.e., 

p = argmin \\oiH h (HH h + pl)~\s + rp)\\ 2 (16.4-24) 

p 

where a is selected to satisfy the transmitted power allocation constraint, f J > is se- 
lected to maximize the signal-to-interference-plus-noise ratio, and r is selected as 
described previously to result in a symmetric decoding region around the real and 
imaginary components of every signal constellation symbol. Hence the received signal 
vector is 


/- = c/HH h (HH h + pi)~\s + rp)+r] (16.4-25) 


The ?nth user assumes that its received signal has the form 

r„, = a(s m + xpm) + Tj' m (16.4-26) 

where r)' m includes the additive channel noise and the MAI from other users due to the 
nonzero scale factor p. Since each user knows a and r , the mth user performs the modulo 
operation on r m to remove p m and passes the result to its decoder. It is demonstrated 
in Hochwald et al. (2005) that the performance of this vector perturbation scheme is 
significantly better than the linear MMSE precoder described in Section 16.4-1. 


16.4-4 Lattice Reduction Technique for Precoding 

Lattice constellations are quite common in designing signal sets for communication 
systems. We have studied the main properties of lattices and lattice-based constellations 
in Section 4.7. Lattice precoding is a technique similar to the Tomlinson-Harashima 
precoding that can be used with channels with known interference at the transmitter. 

We consider the MIMO broadcast channel model with Nt transmit antennas at the 
base station and K receivers each with a single antenna. We also assume K < Nj- The 
input-output relation for the channel is written as 

y = Hx + r) (16.4-27) 

where x and y are the transmitted and received signals with Nj and K components, 
respectively, r] is a vector of iid random variables each drawn according to CAf (0, Mi), 
and H is a K x N r matrix of complex channel coefficients. As previously stated, the 
matrix H is assumed to be perfectly known at the transmitter. 

The original lattice reduction techniques were developed for real lattices and in 
order to employ them it is convenient to introduce areal equivalent of the communication 
system under study. Equation 16.4-27 is equivalent to the following form in which all 
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quantities are real 


Re(jO 

R e(H) -Im (H) 

Re(jc) 

_l_ 

Re(ij) 

Im(y) 

Im (H) R e(H) 

Im(jc) 

I 

Im(»/) 


This equation can be written as 

y r = H,x, + ri, 


(16.4-28) 


(16.4-29) 


The vector of data symbols intended for the K receivers is denoted by s, which 
is a K -dimensional vector with components in an M- ary QAM constellation which is 
defined as a set of lattice points with a given boundary. 

We have seen different types of precoding in the previous sections, among them 
zero-forcing precoding matrix of the form Aj r — aH+ = aH^{H r H^) 1 resulting 
in 

JC, = A Tr s r = aH” (H r H”y l s r (16.4-30) 

and MMSE precoding matrix of the form A Tr = olH » + fil) * resulting in 

JC,- = A Tr Sr = OlH f (//,-//" + pi)~ l s r (16.4-31) 

as examples of linear precoding, and Tomlinson-Harashima which uses modulo arith- 
metic at the transmitter and requires a modulo operation at the receiver before quantizing 
to the M- ary QAM constellation. This nonlinear precoding technique is based on the QR 
decomposition of H, and successive cancellation whose performance can be improved 
by optimal ordering of the subchannels using the algorithm described by Foschini et 
al. (1999). 

The perturbation method of Section 16.4-3 can also be expressed in terms of the 
real equivalent matrix representation of Equation 16.4—29 as 

x r = A Tr (s r + p) 

p = argmin \\A Tr (s r + p')\\ 2 (16.4-32) 

p'eaZ 2K 

where Z 2A is the 2 if -dimensional integer lattice and a is the scalar (2 y/M) in the 
Tomlinson-Harashima modulo operation. The optimization of p in Equation 16.4-32 
can be interpreted as finding the closest point in the lattice aA Tr Z 2K to —A Tr s r , which 
can be accomplished using the Voronoi regions of the lattice. 

As studied in Section 4.7, a lattice can be expressed in terms of its generator matrix 
G whose rows denote a basis for the lattice; i.e., all lattice points can be written as 
a linear combination of the rows of G with integer coefficients. Any lattice A can 
have many generator matrices and many bases for representation of lattice points. In 
particular, if F is a square matrix with integer entries such that det F = ±1, then F 1 
exists and its entries are all integers. Then G' = FG is a generator of lattice A. The 
new generator matrix G' defines a new basis for the lattice A. A desirable property 
of the modified lattice basis is that it be an orthogonal or close-to-orthogonal basis 
with the lowest basis vector norms. The process of finding such a basis for a lattice 
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is called lattice reduction. Although lattice reduction in high dimensions is an NP- 
hard problem, a polynomial-time suboptimal lattice reduction method due to Lenstra, 
Lenstra, and Lovasz, known as the LLL algorithm for lattice reduction, exists that in 
most cases gives very good results (Lenstra et al. (1982)). 

Since we are looking for p in lattice aAj r l? k that is closest to —Aj r s r , we can 
apply the LLL algorithm and write 


A tv = W, F r (16.4-33) 

where W r is a real-valued 2N r x 2 K matrix, representing the transformed close-to- 
orthogonal basis and F, is the integer-valued matrix with det F, = ± I that represents 
the transformation. A benefit of a close-to-orthogonal basis with low basis vector norm 
is that when linear interference mitigation techniques are applied to this bases, noise 
enhancement effects are lower. 

In Figure 16.4-11 the left diagram shows the lattice corresponding to aA Tr 7? 
with its Voronoi regions representing minimum-distance solutions of Equation 16.4-32. 
The original basis for this lattice is denoted by the dashed arrows. Applying LLL to 
this lattice results in the reduced basis denoted by solid arrows which are closer to 
an orthogonal basis compared to the original basis. If we use the original basis for 
linear equalization, we obtain the figure shown in the middle in which the dashed 
arrows are orthonormal. However, the integer grid shown with dashed boundaries does 
not match the modified Voronoi regions. In fact, large white areas that correspond to 
the mismatch between the two regions indicate the inefficiency of this approach. In 
the rightmost figure, the result of applying linear equalization to the reduced basis is 
shown. As seen here, there is good overlap between the modified Voronoi regions and 
the integer grid, indicating the efficiency of this method. 

The lattice reduction method has also been applied directly to lattices in complex 
dimensions using a complex version of the LLL algorithm as described by Gan and 
Mow (2005). In this case the lattice is described by n linear independent complex row 
vectors g \ , g 2 , . . . , g„ of length n that constitute a basis for the lattice . All lattice points 



FIGURE 16.4-11 

Left. Lattice AH+J? and its Voronoi regions with original basis (dashed) and modified basis 
(solid). Middle: Linear equalization applied to the original basis. Right: Linear equalization 
applied to the modified basis. [From Windpassinger et al. (2004), copyright IEEE.) 
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can be written as 


n 

x = ^2agi (16.4-34) 

(=1 

where c, ’s are complex numbers with integer real and imaginary parts and matrix G 
whose rows are gj’s is the generator of the lattice. Similar to real lattices, if G' = GF 
and F is a square matrix with complex entries with integer real and imaginary parts 
such that det F = ±1 or det F = =b j . Then G' is also a basis for the lattice generated 
by G. The complex LLL reduction is of the form Aj = WF where W represents the 
close-to-orthogonal reduced basis. 

Depending on the approach selected, At can have different forms. For the zero- 
forcing approach A T = aH + = oiH h (HH h )~ 1 and for the MMSE approach A T = 
a H h (HH h + ft I ) 1 . For the perturbation method which employs Voronoi regions to 
find the closest lattice point, the approximate offset vector is given by 

P W o, = -F~ l Q{Fs) (16.4-35) 

where <20 ) denotes the componentwise rounding of the K -dimensional vector to the 
scaled complex integer lattice. 

The lattice reduction technique studied by Windpassinger et al. (2004) indicates 
the effectiveness of this method in improving the performance through increasing the 
diversity gain. In fact the order of signal diversity achieved by the lattice reduction 
technique is comparable to the signal diversity obtained by the maximum-likelihood 
detection, but this signal diversity in the lattice reduction technique is obtained at a 
much lower complexity. The interested reader is referred to Yao and Wornell (2002), 
Fischer and Windpassinger (2003), and Windpassinger et al. (2004) for details. 


■ 16.5 

RANDOM ACCESS METHODS 

In this section, we consider a multiuser communication system in which users transmit 
information in packets over a common channel. In contrast to the CDMA method de- 
scribed in Section 16.3, the information signals of the users are not spread in frequency. 
As a consequence, simultaneous transmission of signals from multiple users cannot be 
separated at the receiver, without the use of spatial filtering which can be achieved by 
multiple receiving antennas. The access methods described below are basically ran- 
dom, because packets are generated according to some statistical model. Users access 
the channel when they have one or more packets to transmit. When more than one 
user attempts to transmit packets simultaneously, the packets overlap in time, i.e., they 
collide, and, hence, a conflict results, which must be resolved by devising some channel 
protocol for retransmission of the packets. Below, we describe several random access 
channel protocols that resolve conflicts in packet transmission. 
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FIGURE 16.5-1 

Random access packet transmission: 

(a) packets from a typical user; 

(b) packets from several users. 
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16.5-1 ALOHA Systems and Protocols 

Suppose that a random access scheme is employed where each user transmits a packet 
as soon as it is generated. When a packet is transmitted by a user and no other user 
transmits a packet for the duration of the time interval, then the packet is considered 
successfully transmitted. However, if one or more of the other users transmits a packet 
that overlaps in time with the packet from the first user, a collision occurs and the 
transmission is unsuccessful. Figure 16.5-1 illustrates this scenario. If the users know 
when their packets are transmitted successfully and when they have collided with other 
packets, it is possible to devise a scheme, which we may call a channel access protocol, 
for retransmission of collided packets. 

Feedback to the users regarding the successful or unsuccessful transmission of 
packets is necessary and can be provided in a number of ways. In a radio broadcast 
system, such as one that employs a satellite relay as depicted in Figure 16.5-2, the 
packets are broadcast to all the users on the downlink. Hence, all the transmitters 
can monitor their transmissions and, thus, obtain the following ternary information: no 
packet was transmitted, or a packet was transmitted successfully, or a collision occurred. 
This type of feedback to the transmitters is generally denoted as (0, 1 , c) feedback. In 
systems that employ wireline or filter-optic channels, the receiver may transmit the 
feedback signal on a separate channel. 


Q Broadcast 


FIGURE 16.5-2 

Broadcast system. 
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The ALOHA system devised by Abramson (1970, 1977) and others at the Univer- 
sity of Hawaii employs a satellite repeater that broadcasts the packets received from the 
various users who access the satellite. In this case, all the users can monitor the satellite 
transmissions and, thus, establish whether or not their packets have been transmitted 
successfully. 

There are basically two types of ALOHA systems: synchronized or slotted and 
unsynchronized or unslotted. In an unslotted ALOHA system, a user may begin trans- 
mitting a packet at any arbitrary time. In a slotted ALOHA, the packets are transmitted 
in time slots that have specified beginning and ending times. 

We assume that the start time of packets that are transmitted is a Poisson point 
process having an average rate of k packets/s. Let T p denote the time duration of a 
packet. Then, the normalized channel traffic G, also called the offered channel traffic , 
is defined as 


G = kT p (16.5-1) 

There are many channel access protocols that can be used to handle collisions. Let 
us consider the one due to Abramson ( 1973). In Abramson’s protocol, packets that have 
collided are retransmitted with some delay r, where r is randomly selected according 
to the PDF 


p( r) = ae “ T (16.5-2) 

where a is a design parameter. The random delay r is added to the time of the initial 
transmission and the packet is retransmitted at the new time. If a collision occurs 
again, a new value of r is randomly selected and the packet is retransmitted with a 
new delay from the time of the second transmission. This process is continued until 
the packet is transmitted successfully. The design parameter a determines the average 
delay between retransmissions. The smaller the value of a , the longer the delay between 
retransmissions. 

Now, let A/, where k' < k, be the rate at which packets are transmitted successfully. 
Then, the normalized channel throughput is 

S = k'T p (16.5-3) 

We can relate the channel throughput S to the offered channel traffic G by making 
use of the assumed start time distribution. The probability that a packet will not overlap 
a given packet is simply the probability that no packet begins T p seconds before or T p 
seconds after the start time of the transmitted packet. Since the start time of all packets 
is Poisson-distributed, the probability that a packet will not overlap is exp(— 2 kT p ) = 
exp(— 2G). Therefore, 

S = Ge~ 2G (16.5-4) 

This relationship is plotted in Figure 16.5-3. We observe that the maximum throughput 
is Smax = l/2e = 0.184 packets per slot, which occurs at G = \. When G > j, the 
throughput S decreases. The above development illustrates that an unsynchronized or 
unslotted random access method has a relatively small throughput and is inefficient. 
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FIGURE 16.5-3 

Throughput in ALOHA systems. 

Throughput for slotted ALOHA To determine the throughput in a slotted ALOHA 
system, let G, be the probability that the i th user will transmit a packet in some slot. If 
all the K users operate independently and there is no statistical dependence between the 
transmission of the user’s packet in the current slot and the transmission of the user’s 
packet in previous time slots, the total (normalized) offered channel traffic is 

K 

G = J2 G i (16.5-5) 

i=l 

Note that, in this case, G may be greater than unity. 

Now, let Si < Gj be the probability that a packet transmitted in a time slot is 
received without a collision. Then, the normalized channel throughput is 

K 

S = J2 S < (16.5-6) 

1 = 1 

The probability that a packet from the i th user will not have a collision with another 
packet is 

K 

Q, = [pi - Gf) (16.5-7) 

j = i 

m 

Therefore, 

Si = G, Qi (16.5-8) 

A simple expression for the channel throughput is obtained by considering K 
identical users. Then, 
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and 



(16.5-9) 


Then, if we let K — > oo, we obtain the throughput 


S = Ge~ G 


(16.5-10) 


This result is also plotted in Figure 16.5-3. We observe that S reaches a maximum 
throughput of ,S max = l/e = 0.368 packets per slot at G = 1, which is twice the 
throughput of the unslotted ALOHA system. 

The performance of the slotted ALOHA system given above is based on Abram- 
son’s protocol for handling collisions. A higher throughput is possible by devising a 
better protocol. 

A basic weakness in Abramson’s protocol is that it does not take into account the 
information on the amount of traffic on the channel that is available from observation of 
the collisions that occur. An improvement in throughput of the slotted ALOHA system 
can be obtained by using a tree-type protocol devised by Capetanakis (1979). In this 
algorithm, users are not allowed to transmit new packets that are generated until all ear- 
lier collisions are resolved. A user can transmit a new packet in a time slot immediately 
following its generation, provided that all previous packets that have collided have been 
transmitted successfully. If a new packet is generated while the channel is clearing the 
previous collisions, the packet is stored in a buffer. When a new packet collides with 
another, each user assigns its respective packet to one of two sets, say A or B, with equal 
probability (by flipping a coin). Then, if a packet is put in set A, the user transmits it 
in the next time slot. If it collides again, the user will again randomly assign the packet 
to one of two sets and the process of transmission is repeated. This process continues 
until all packets contained in set A are transmitted successfully. Then, all packets in set 
B are transmitted following the same procedure. All the users monitor the state of the 
channel, and, hence, they know when all the collisions have been serviced. 

When the channel becomes available for transmission of new packets, the earliest 
generated packets are transmitted first. To establish a queue, the time scale is subdivided 
into subintervals of sufficiently short duration such that, on average, approximately one 
packet is generated by a user in a subinterval. Thus, each packet has a “time tag” 
that is associated with the subinterval in which it was generated. Then, a new packet 
belonging to the first subinterval is transmitted in the first available time slot. If there 
is no collision, then a packet from the second subinterval is transmitted, and so on. 
This procedure continues as new packets are generated and as long as any backlog of 
packets for transmission exists. Capetanakis has demonstrated that this channel access 
protocol achieves a maximum throughput of 0.43 packets per slot. 

In addition to throughput, another important performance measure in a random 
access system is the average transmission delay in transmitting a packet. In an ALOHA 
system, the average number of transmissions per packet is G/S. To this number we may 
add the average waiting time between transmissions and, thus, obtain an average delay 
for a successful transmission. We recall from the above discussion that in the Abramson 
protocol, the parameter a determines the average delay between retransmissions. If we 
select a small, we obtain the desirable effect of smoothing out the channel load at times 
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of peak loading, but the result is a long retransmission delay. This is the trade-off in 
the selection of a in Equation 16.5-2. On the other hand, the Capetanakis protocol has 
been shown to have a smaller average delay in the transmission of packets. Hence, it 
outperforms Abramson’s protocol in both average delay and throughput. 

Another important issue in the design of random access protocols is the stability of 
the protocol. In our treatment of ALOHA-type channel access protocols, we implicitly 
assumed that for a given offered load, an equilibrium point is reached where the average 
number of packets entering the channel is equal to the average number of packets trans- 
mitted successfully. In fact, it can be demonstrated that any channel access protocol, 
such as the Abramson protocol, that does not take into account the number of previous 
unsuccessful transmissions in establishing a retransmission policy is inherently unsta- 
ble. On the other hand, the Capetanakis algorithm differs from the Abramson protocol 
in this respect and has been proved to be stable. A thorough discussion of the stability 
issues of random access protocols is found in the paper by Massey (1988). 


16.5-2 Carrier Sense Systems and Protocols 

As we have observed, ALOHA-type (slotted and unslotted) random access protocols 
yield relatively low throughput. Furthermore, a slotted ALOHA system requires that 
users transmit at synchronized time slots. In channels where transmission delays are 
relatively small, it is possible to design random access protocols that yield higher 
throughput. An example of such a protocol is carrier sensing with collision detection, 
which is used as a standard Ethernet protocol in local area networks. This protocol is 
generally known as carrier sense multiple access with collision detection (CSMA/CD). 

The CSMA/CD protocol is simple. All users listen for transmissions on the channel. 
A user who wishes to transmit a packet seizes the channel when it senses that the channel 
is idle. Collisions may occur when two or more users sense an idle channel and begin 
transmission. When the users that are transmitting simultaneously sense a collision, 
they transmit a special signal, called a jam signal, that serves to notify all users of the 
collision and abort their transmissions. Both the carrier sensing feature and the abortion 
of transmission when a collision occurs result in minimizing the channel downtime and, 
hence, yield a higher throughput. 

To elaborate on the efficiency of CSMA/CD, let us consider a local area network 
having a bus architecture, as shown in Figure 16.5-4. Consider two users U\ and Ui at 
the maximum separation, i.e., at the two ends of the bus, and let r,/ be the propagation 



(propagation delay) 


FIGURE 16.5-4 

Local area network with bus architecture. 
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delay for a signal to travel the length of the bus. Then, the (maximum) time required 
to sense an idle channel is rj. Suppose that U\ transmits a packet of duration T p . 
User Uo may seize the channel r,j seconds later by using carrier sensing and begins to 
transmit. However, user U\ would not know of this transmission until x,i seconds after 
U 2 begins transmission. Hence, we may define the time interval 2xj as the (maximum) 
time interval to detect a collision. If we assume that the time required to transmit the jam 
signal is negligible, the CSMA/CD protocol yields a high throughput when 2 tj T p . 

There are several possible protocols that may be used to reschedule transmissions 
when a collision occurs. One protocol is called nonpersistent CSMA, a second is called 
1 -persistent CSMA, and a generalization of the latter is called /^-persistent CSMA. 

Nonpersistent CSMA In this protocol, a user that has a packet to transmit senses 
the channel and operates according to the following rule. 

(a) If the channel is idle, the user transmits a packet. 

(b) If the channel is sensed busy, the user schedules the packet transmission at a later 
time according to some delay distribution. At the end of the delay interval, the user 
again senses the channel and repeats steps (a) and (b). 

1 -Persistent CSMA This protocol is designed to achieve high throughput by not 
allowing the channel to go idle if some user has a packet to transmit. Hence, the user 
senses the channel and operates according to the following rule. 

(a) If the channel is sensed idle, the user transmits the packet with probability 1. 

(b) If the channel is sensed busy, the user waits until the channel becomes idle and 
transmits a packet with probability one. Note that in this protocol, a collision will 
always occur when more than one user has a packet to transmit. 

p-Persistent CSMA To reduce the rate of collisions in 1 -persistent CSMA and 
increase the throughput, we should randomize the starting time for transmission of 
packets. In particular, upon sensing that the channel is idle, a user with a packet to 
transmit sends it with probability p and delays it by r with probability 1 — p. The 
probability p is chosen in a way that reduces the probability of collisions while the 
idle periods between consecutive (non-overlapping) transmissions is kept small. This 
is accomplished by subdividing the time axis into minislots of duration r and selecting 
the packet transmission at the beginning of a minislot. In summary, in the p-pcrsi stent 
protocol, a user with a packet to transmit proceeds as follows. 

(a) If the channel is sensed idle, the packet is transmitted with probability p, and with 
probability 1 — p the transmission is delayed by r seconds. 

(b) If at t = r, the channel is still sensed to be idle, step (a) is repeated. If a colli- 
sion occurs, the users schedule retransmission of the packets according to some 
preselected transmission delay distribution. 

(c) If at t = r, the channel is sensed busy, the user waits until it becomes idle, and the 
operates as in steps (a) and (b) above. 

Slotted versions of the above protocol can also be constructed. 
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The throughput analysis for the nonpersistent and the /^-persistent CSMA/CD pro- 
tocols has been performed by Kleinroch and Tobagi (1975), based on the following 
assumptions: 

1. The average retransmission delay is large compared with the packet duration T p . 

2. The interarrival times of the point process defined by the start times of all the packets 
plus retransmissions are independent and exponentially distributed. 

For the nonpersistent CSMA, the throughput is 


Ge~ aC 

G( 1 + 2a) + e~ aG 


(16.5-11) 


where the parameters = r t j / T p . Note that as a — »• 0, S — »• G /( I + G). Figure 16.5-5 
illustrates the throughput versus the offered traffic G, with a as a parameter. We observe 
that S — >■ 1 as G — > oo for a = 0. For a > 0, the value of ,S max decreases. 

For the 1 -persistent protocol, the throughput obtained by Kleinrock and Tobagi 
(1975) is 


G[\ + G + aG(l + G + \aG)]e- GG+la) 

~ G(1 + 2a) - (1 - e~ aG ) + (1 + sG)e“ G ( 1 +«) 

In this case, 


lim S = 

a-> 0 


G(1 + G)e- 


G + e~ G 

which has a smaller peak value than the nonpersistent protocol. 


(16.5-12) 


(16.5-13) 



0.01 0.1 1 10 100 
Offered channel traffic G 


FIGURE 16.5-5 

Throughput in nonpersistent CSMA. [From Kleinrock and Tobagi (1975), © IEEE.) 
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FIGURE 16.5-6 

Channel throughput in p-persistent 
CSMA: (a) a = 0; (b) a = 0.01; 

(c) a = 0. 1. [From Kleinrock and 
Tobagi (1975), © IEEE.] 




By adopting the p-persistent protocol, it is possible to increase the throughput 
relative to the 1-persistent scheme. For example, Figure 16.5-6 illustrates the throughput 
versus the offered traffic with a = r,// T p fixed and with p as a parameter. We observe 
that as p increases toward unity, the maximum throughput decreases. 

The transmission delay was also evaluated by Kleinrock and Tobagi (1975). 
Figure 16.5-7 illustrates the graphs of the delay (normalized by T p ) versus the through- 
put S for the slotted nonpersistent and p-persistent CSMA protocols. Also shown for 
comparison is the delay versus throughput characteristic of the ALOHA slotted and 
unslotted protocols. In this simulation, only the newly generated packets are derived in- 
dependently from a Poisson distribution. Collisions and uniformly distributed random 
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Slotted 



FIGURE 16.5-7 

Throughput versus delay from simulation ( a = 0.01). [From Kleinrock and Tobagi (1975), 
© IEEE.] 


retransmissions are handled without further assumptions. These simulation results 
illustrate the superior performance of the /j-persistent and the nonpersistent protocols 
relative to the ALOHA protocols. Note that the graph label “optimum /^-persistent” 
is obtained by finding the optimum value of p for each value of the throughput. We 
observe that for small values of the throughput, the 1-persistent (p = 1) protocol is 
optimal. 


■ 16.6 

BIBLIOGRAPHICAL NOTES AND REFERENCES 

FDMA was the dominant multiple access scheme that has been used for decades in 
telephone communication systems for analog voice transmission. With the advent of 
digital speech transmission using PCM, DPCM, and other speech coding methods, 
TDMA has replaced FDMA as the dominant multiple access scheme in telecommuni- 
cations. CDMA and random access methods, in general, have been developed over the 
past three decades, primarily for use in wireless signal transmission and in local area 
wireline networks. 
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Multiuser information theory deals with basic information-theoretic limits in source 
coding for multiple sources, and channel coding and modulation for multiple access 
channels. A large amount of literature exists on these topics. In the context of our 
treatment of multiple access methods, the reader will hnd the papers by Cover (1972), 
El Gamal and Cover (1980), Bergmans and Cover (1974), Hui (1984), Cover (1998), 
and the book by Cover and Thomas (2006) particularly relevant. The capacity of a 
cellular CDMA system has been considered in the paper by Gilhousen et al. (1991). 

Signal demodulation and detection for multiuser communications has received 
considerable attention in recent years. The reader is referred to the papers by Verdu 
(1986a, b,c, 1989), Lupas and Verdu (1990), Xie et al. (1990a,b), Poor and Verdu (1988), 
Zhang and Brady ( 1993), Madhow and Honig (1994), Zvonar and Brady (1995), Viterbi 
(1990), Varanasi (1999), and the books by Verdu (1998), Viterbi (1995), and Garg et al. 
(1997). Earlier work on signal design and demodulation for multiuser communications 
is found in the papers by Van Etten (1975, 1976), Horwood and Gagliardi (1975), and 
Kaye and George (1970). 

The achievable throughput (capacity) of point-to-multipoint signal transmission 
employing multiple antennas in a Gaussian broadcast channel has been evaluated in 
papers published by Yu and Cioffi (2002), Caire and Shamai (2003), Viswanath and Tse 
(2003), Vishwanath et al. (2003), and Weingarten et al. (2004), as well as in the book 
by Tse and Viswanath (2005). Various precoding schemes for the MIMO broadcast 
channel have been considered in several publications, including the papers by Yu and 
Cioffi (2001), Fisher et al. (2002), Ginis and Cioffi (2002), Windpassinger et al. (2003, 
2004a, 2004b), Peel et al. (2005), Hochwald et al. (2005), and Amihood et al. (2006, 
2007). The book by Fischer (2002) treats precoding and signal shaping for multichannel 
digital transmission. 

The ALOHA system, which was one of the earliest random access systems, is 
treated in the papers by Abramson (1970, 1977) and Roberts (1975). These papers con- 
tain the throughput analysis for unslotted and slotted systems. More recently, Abramson 
(1994), considers an ALOHA system that employs spread spectrum signals and pro- 
vides a link to CDMA systems. Stability issues regarding the ALOHA protocols may 
be found in the papers by Carleial and Heilman (1975), Ghez et al. (1988), and Massey 
(1988). Stable protocols based on tree algorithms for random access channels were 
first given by Capetanakis (1979). The carrier sense multiple access protocols that we 
described are due to Kleinrock and Tobagi (1975). Finally, we mention the IEEE Press 
book edited by Abramson (1993), which contains a collection of papers dealing with 
multiple access communications. 


PROBLEMS 

16.1 In the formulation of the CDMA signal and channel models described in Section 16.3-1, 
we assumed that the received signals are real. For K > 1, this assumption implies 
phase synchronism at all transmitters, which is not very realistic in a practical system. To 
accommodate the case where the carrier phases are not synchronous, we may simply alter 
the signature waveforms for the K users, given by Equation 16.3-1, to be complex-valued, 
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of the form 


L - 1 


gk(0 = e j6t ^2 a k (n)p(t - nT c ), 1 < k < K 


where 9 represents the constant phase offset of the &th transmitter as seen by the common 


a. Given this complex-valued form for the signature waveforms, determine the form 
of the optimum ML receiver that computes the correlation metrics analogous to Equa- 
tion 16.3-15. 

b. Repeat the derivation for the optimum ML detector for asynchronous transmission 
that is analogous to Equation 16.3-19. 

16.2 Consider a TDMA system where each user is limited to a transmitted power P, indepen- 
dent of the number of users. Determine the capacity per user, Ck, and the total capacity 
KCk ■ Plot Ck and KCk as functions of £b/No and comment on the results as K — > oo. 

16.3 Consider an FDMA system with K = 2 users, in an AWGN channel, where user 1 is 
assigned a bandwidth W\ = aW and user 2 is assigned a bandwidth W 2 = (1 — a)W , 
where 0 < a < 1. Let Pi and P 2 be the average powers of the two users. 

a. Determine the capacities Ci and C 2 of the two users and their sum C = Ci + C 2 as a 
function of a. On a two-dimensional graph of the rates R 2 versus Pi, plot the graph 
of the points (C 2 , Ci) as a varies in the range 0 < 01 < 1. 

b. Recall that the rates of the two users must satisfy the conditions 


Determine the total capacity C when P\/a = P 2 /(l — a) = P\ + P 2 , and, thus, show 
that the maximum rate is achieved when a/(l — a) = P 1 /P 2 = W\/Wi. 

16.4 Consider a TDMA system with K = 2 users in an AWGN channel. Suppose that the two 
transmitters are peak-power-limited to Pi and Pi, and let user 1 transmit for 100a percent 
of the available time and user 2 transmit 100(1 — a) percent of the time. The available 
bandwidth is W. 

a. Determine the capacities Ci, C 2 , and C = Ci + C 2 as functions of a. 

b. Plot the graph of the points (C 2 , Ci) as a varies in the range 0 < a < 1. 

16.5 Consider a TDMA system with K = 2 users in an AWGN channel. Suppose that the two 
transmitters are average-power-limited, with powers Pi and P 2 . User 1 transmits 100a 
percent of the time and user 2 transmits 100(1 — a) percent of the time. The channel 
bandwidth is W. 

a. Determine the capacities Ci, C 2 , and C = Ci + C 2 as functions of a. 

b. Plot the graph of the points (C 2 , Ci) as a varies in the range 0 < a < 1. 

c. What is the similarity between this solution and the FDMA system in Problem 16.3? 


receiver. 
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16.6 Consider a two-user, synchronous CDMA transmission system, where the received 
signal is 

r(t) = V£ibigi(t) + 'fS-ibigiit) + n(t), 0 < t < T 

and (bi, b 2 ) = (±1, ±1). The noise process n{t) is zero-mean Gaussian and white, with 
spectral density (Vo/2. The demodulator for r(t) is shown in Figure P16.6. 

a. Show that the correlator outputs r\ and r 2 at t = T may be expressed as 

r\ = + \/£ 2 pb 2 + «i 

r 2 = \fi\b\p + \[£ 2 b 2 + n 2 

b. Determine the variances of n\ and n 2 and the covariance of n\ and n 2 . 

c. Determine the joint PDF p(r\, r 2 \b\,b 2 ). 


g i(0 



gi(f) at t=T 


FIGURE P16.6 


16.7 Consider the two-user, synchronous CDMA transmission system described in Prob- 
lem 16.6. The conventional single-user detector for the information bits b\ and b 2 gives 
the outputs 


b 1 = sgn(n) 
b 2 = sgn(r 2 ) 

Assuming that P(b\ = 1) = P(b 2 = 1) = |, and b\ and b 2 are statistically independent, 
determine the probability of error for this detector. 

16.8 Consider the two-user, synchronous CDMA transmission system described in Prob- 
lem 16.6. P(bi = 1) = P{b 2 = 1) = | and P{b\,b 2 ) = P{b\)P{b 2 ). The jointly 
optimum detector makes decisions based on the maximum a posteriori probability (MAP) 
criterion. That is, the detector computes 

max P[b\, b 2 \r(t), 0 < t < T] 
b\M 

a. For the equally likely information bits (b\,b 2 ) show that the MAP criterion is equiv- 
alent to the maximum-likelihood (ML) criterion 

max p[r(t), 0 < t < T\b\, b 2 \ 

biM 
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b. Show that the ML criterion in (a) leads to the jointly optimum detector that makes 
decisions on b\ and £>2 according to the following rule: 

max ( \l~E\b\r\ + s/~£. 2 b 2 r 2 — V £\£ 2 pb\b 2 ) 
b\,bi V / 

16.9 Consider the two-user, synchronous CDMA transmission system described in Prob- 
lem 16 . 6 . P{b\ = 1 ) = P(b 2 = 1 ) = \ and P{b\, b 2 ) = P(b\)P{b 2 ). The individually 
optimum detector makes decisions based on the MAP criterion. That is, the detector 
computes the a posteriori probabilities. 

P[bi\r(t), 0<f < T] = P[b u b 2 = l\r(t), 0 < t < T] 

+ P[b u b 2 = —l|ra), 0 < t < T] 


and 


P[b 2 \r(t), 0 < t < T] = P[b\ = 1, b 2 \r(t), 0 < t < T] 

+ P[bi = -1 ,b 2 \r(t), 0 < t < T] 


a. Show that an equivalent test statistic for this individually optimum MAP detector for 
the information bit b\ is 


max 

b\ 


j£i 

N 0 


r 1 


b 1 + In cosh 


/ *fE 2 r 2 - JZ\Z 2 pb\ 




No 


b. By substituting b\ = 1 and b\ = — 1 into the expression in (a), show that the test 
statistic in (a) is equivalent to selecting b\ according to the relation 


b\ = sgn 


No cosh (y/£ 2 r 2 + ~JP\P 2 p)/N 0 
2\[£\ cosh (y/£ 2 r 2 — *J£\£ 2 p) /No 


16.10 Show that the asymptotic efficiency of the conventional single-user detector in a CDMA 
system with K users transmitting synchronously is 


’Ik 



-1 2 


16.11 Consider the jointly optimum detector defined in Problem 16.8 for the two-user, syn- 
chronous CDMA system. Show that the (symbol) error probability for this detector may 
be upper-bounded as 


Pe < Q 



+ 2 G 



16.12 Consider the jointly optimum detector defined in Problem 16.8 for the two-user, syn- 
chronous CDMA system. 

a. Show that the asymptotic efficiency for this detector for user 1 

1 + | — 2 \/| l ' , i} 
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b. Plot and compare the asymptotic efficiencies of the jointly optimum detector and the 
conventional single-user detector for p = 0.1 and p = 0.2. 

16.13 Consider the two-user synchronous CDMA system in Problem 16.6. Determine the prob- 
ability of error for each user that employs a decorrelating detector when £j =£ £ 2 - 

16.14 Consider a two-user synchronous CDMA system where the received signal is given 
in Problem 16.6. Each user employs the minimum MSE detector specified by Equa- 
tions 16.3-51 to 16.3-53. 

a. Determine the linear transformation matrix A 0 for the two users. 

b. Show that the MMSE detector approaches the decorrelating detector as No —> 0. 

c. Show that the MMSE detector approaches the conventional single-user detector as 
Nq — »• 00. 

16.15 Consider the asynchronous communication system shown in Figure P16.15. The two 

receivers are not colocated, and the white noise processes n (1 \t) and may be 

considered to be independent. The noise processes are identically distributed, with power 
spectral density <7 2 and zero-mean. Since the receivers are not colocated, the relative 
delays between the users are not the same — denote the relative delay of user k at receiver 
i by 1 All other signal parameters coincide for the receivers, and the received signal 
at receiver i is 

2 00 

r ( '\t) = ^2^2 b kQ)Sk(t -IT - r t (,) ) +n (, \t) 

k= 1 /=— 00 

where has support on [0, T\. You may assume that the receiver i has full knowledge of 
the waveforms, energies, and relative delays r{' 1 and ’ . Although receiver i is eventually 
interested only in the data from transmitter i, note that there is a free communication 
link between the sampler of one receiver, and the postprocessing circuitry of the other. 
Following each postprocessor, the decision is attained by threshold detection. In this 
problem, you will consider options for postprocessing and for the communication link in 
order to improve performance. 

a. What is the bit error probability for users 1 and 2 of a receiver pair that does not utilize 
the communication link and does not perform postprocessing? Use the following 



FIGURE P16.15 
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notation: 


ykQ) = J s k (t -IT - Tf)r (t) (l) dt 

Pn = J s i(j - r i' > )' y2 ( r “ t 2 ’) dt 
P21 = J $1 (f - Tj 0 ) + T — Tj ’) dt 

w k = J sj(t - r< u ) dt = J sl(t- if’) dt 

b. Consider a postprocessor for receiver 1 that accepts y 2 (l — 1) and >2(0 from the 
communication link and implements the following postprocessing on y\(l) 

zi(l) = yi(l) - P 2 isga[y 2 (f ~ 1)] - Pi 2 Sgn[y 2 ( 0 ]- 


Determine an exact expression for the bit error rate for user 1. 

c. Determine the asymptotic multiuser efficiency of the receiver proposed in (b), and 
compare with that in (a). Does this receiver always perform better than that proposed 
in (a)? 

16.16 In a pure ALOHA system, the channel bit rate is 2400 bits/s. Suppose that each terminal 
transmits a 100-bit message every minute on the average. 

a. Determine the maximum number of terminals that can use the channel. 

b. Repeat (a) if slotted ALOHA is used. 

16.17 An alternative derivation for the throughput in a pure ALOHA system may be obtained 
from the relation G = S+A, where A is the average (normalized) rate of retransmissions. 
Show that A = G( 1 — e~ 2G ) and then solve for S. 


16.18 For a Poisson process, the probability of k arrivals in a time interval T is 


P(k) = 


e~ XT iXT) k 

k\ 


k = 0,1,2,... 


a. Determine the average number of arrivals in the interval T . 

b. Determine the variance a 2 in the number of arrivals in the interval T . 

c. What is the probability of at least one arrival in the interval T1 

d. What is the probability of exactly one arrival in the interval T? 

16.19 Refer to Problem 16.18. The average arrival rate is k = 10 packets/s. Determine 

a. The average time between arrivals. 

b. The probability that another packet will arrive within 1 s; within 100 ms. 

16.20 Consider a pure ALOHA system that is operating with a throughput S = 0. 1 and packets 
are generated with a Poisson arrival rate X. Determine 

a. The value of G. 

b. The average number of attempted transmissions to send a packet. 
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16.21 Consider a CSMA/CD system in which the transmission rate on the bus is 10 Mbits/s. 
The bus is 2 km and the propagation delay is 5 /xs/km. Packets are 1000 bits long. 
Determine 

a. The end-to-end delay tj. 

b. The packet duration T p . 

c. The ratio tj / T p . 

d. The maximum utilization of the bus and the maximum bit rate. 

16.22 Consider an MA communication system with K = 2 users and an AWGN channel. The 
receiver decodes the two signals by preforming SIC. The signal power levels for the two 
users at the receiver are Pi and Pi. 

a. Suppose that the receiver decodes the signal for user 2 and subtracts signal 2 from the 
received signal. Then the receiver decodes the signal from user 1 without interference. 
Determine the maximum rates that can be achieved by users 1 and 2. 

b. Now suppose that Pi = 10Pj and that the signal from user 2 is decoded first. 
Determine the sum capacity of the two-user system. 

c. Repeat part 2 if user 1 is decoded first, and compare the sum capacities in parts b 
and c. 


APPENDIX A 
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A matrix is a rectangular array of real or complex numbers called the elements of 
the matrix. An n x in matrix has n rows and m columns. If in = n, the matrix is 
called a square matrix. An ^-dimensional vector may be viewed as an n x 1 matrix. 
An n x m matrix may be viewed as having n ///-dimensional vectors as its rows or m 
77 -dimensional vectors as its columns. 

The complex conjugate and the transpose of a matrix A are denoted as A* and A', 
respectively. The conjugate transpose of a matrix with complex elements is denoted as 
A H \ that is, A H = [A*]' = [A‘f. 

A square matrix A is said to be symmetric if A' = A. A square matrix A with 
complex elements is said to be Hermitian if A " = A. If A is a square matrix, then A -1 
designates the inverse of A (if one exists), having the property that 


where I „ is the n x n identity matrix, i.e., a square matrix whose diagonal elements are 
unity and off-diagonal elements are zero. If A has no inverse, it is said to be singular. 

The trace of a square matrix A is denoted as tr(A) and is defined as the sum of the 
diagonal elements, i.e., 


The rank of an n x in matrix A is the maximum number of linearly independent 
columns or rows in the matrix (it makes no difference whether we take rows or columns). 
A matrix is said to be of full rank if its rank is equal to the number of rows or columns, 
whichever is smaller. 

The following are some additional matrix properties (lowercase letters denote 
vectors): 


A- 1 A = AA- 1 = /„ 


(A-l) 


n 



(A-2) 


(AvY = v’A' ( AB r 1 = B l A l 

(. ABY = BA f (AT 1 = (A -1 / 


(A-3) 
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Let A be an n x n square matrix. A nonzero vector v is called an eigenvector of A and 
X is the associated eigenvalue if 


Av = Xv 


(A-4) 


If A is a Hermitian n x n matrix, then there exist n mutually orthogonal eigenvectors 
Vi, i = 1,2 , ,n. Usually, we normalize each eigenvector to unit length, so that 


vfvj = 


i = j 
i ¥ j 


(A-5) 


In such a case, the eigenvectors are orthonormal. 

We define an n x n matrix Q whose ith column is the eigenvector v,. Then 


q h q = qq h = / 


(A-6) 


Furthermore, A may be represented (decomposed) as 


A = QAQ h 


(A-7) 


where A is an n x n diagonal matrix with elements equal to the eigenvalues of A. This 
decomposition is called a spectral decomposition of a Hermitian matrix. 

If u is an n x 1 nonzero vector for which Au = 0 , then u is called a null vector of 
A. When A is Hermitian and Au = 0 for some vector w, then A is singular. A singular 
Hermitian matrix has at least one zero eigenvalue. 

Now, consider the scalar quadratic form u H Au associated with the Hermitian 
matrix A. If u H Au > 0, the matrix A is said to be positive definite. In such a case, all 
the eigenvalues of A are positive. On the other hand, if u H Au > 0, matrix A is said to 
be positive semidefinite. In such a case, all the eigenvalues of A are nonnegative. 

The following properties involving the eigenvalues of an arbitrary n x n matrix 
A = (a,j) n hold: 


n n 

^2 Xi = ^2 a n = tr (A) 

i=i i=i 

n 

[J K = det(A) 

1=1 

n 

J2 X i = tr ( A<L) 

«a<a ) = ±±4>±x?, 

1=1 j= 1 i=l 


(A-8) 
(A-9) 
(A- 10) 


A real 


(A-ll) 
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■ A.2 

SINGULAR- VALUE DECOMPOSITION 

The singular-value decomposition (SVD) is another orthogonal decomposition of a 
matrix. Let us assume that A is an n x m matrix of rank r. Then there exist an n x r 
matrix U, an in x r matrix V , and an r x r diagonal matrix £ such that U H U = 
V H V = I, and 

A = UEV H (A- 12) 

where X = diag {o\,o 2 , , <?,.). The r diagonal elements of X are strictly positive and 
are called the singular values of matrix A. For convenience, we assume that ct\> o 2 > 
> Or- 

The SVD of matrix A may be expressed as 

r 

A = £*,«** (A— 13) 

i=t 

where m, are the column vectors of U, which are called the left singular vectors of A, 
and Vj are the column vectors of V , which are called the right singular vectors of A. 

The singular values { 07 } are the nonnegative square roots of the eigenvalues of 
matrix A H A. To demonstrate this, we postmultiply Equation A-12 by V. Thus, we 
obtain 

AV = t/X (A- 14) 

or, equivalently, 

Avi = cfjUi , i = 1,2, ... ,r (A— 1 5) 

Similarly, we postmultiply A H = V EU H by V . Thus, we obtain 

A h U = VE (A- 16) 

or, equivalently, 

A H Uj = crVj, i = l,2,...,r (A— 17) 

Then, by premultiplying both sides of Equation A-15 with A " and using Equ- 
ation A-17, we obtain 

A H Avj = of Vi, i = 1,2, ... ,r (A— 1 8) 

This demonstrates that the r nonzero eigenvalues of A 1 ' A are the squares of the singular 
values of A, and the corresponding r eigenvectors u, are the right singular vectors of A. 
The remaining m — r eigenvalues of A 11 A are zero. On the other hand, if we premultiply 
both sides of Equation A-17 by A and use Equation A-15, we obtain 

AA H Ui = afiii, i = 1,2 , ... ,r (A— 19) 

This demonstrates that the r nonzero eigenvalues of AA H are the squares of the singular 
values of A, and the corresponding r eigenvectors m, are the left singular vectors of A. 
The remaining n — r eigenvalues of A A" are zero. Hence, A A" and A H A have the 
same set of nonzero eigenvalues. 


1088 


■ A.3 

MATRIX NORM AND CONDITION NUMBER 


Digital Communications 


Recall that the Euclidean norm (L 2 norm) of a vector u, denoted as ||u||, is defined as 

||u||= (v H v) 1/2 (A-20) 

The Euclidean norm of a matrix A, denoted as ||A||, is defined as 

11 411 n 1 \ 

||A||= max — — — (A— 21) 

IM 

for any vector v. It is easy to verify that the norm of a Hermitian matrix is equal to the 
largest eigenvalue. 

Another useful quantity associated with a matrix A is the nonzero minimum value 
of ||Au||/||u||. When A is a nonsingular Hermitian matrix, this minimum value is equal 
to the smallest eigenvalue. 

The squared Frobenius norm of an n x m matrix A is defined as 

n n 

l|A||2 = tr(AA H ) = M 2 (A-22) 

,=i j = 1 

From the SVD of the matrix A, it follows that 

n 

II A || = ]>>,• (A-23) 

i=i 

where {A,} are the eigenvalues of AA H . 

The following are bounds on matrix norms: 

II A|| > 0, A ^ 0 

II A + B\\ < || A|| + ||B|| (A-24) 

l|AB||<||A||||B|| 

The condition number of a matrix A is defined as the ratio of the maximum value 
to the minimum value of ||Au||/||u||. When A is Hermitian, the condition number is 
^•maxAmin, where A max is the largest eigenvalue and /, mm is the smallest eigenvalue 
of A. 


■ A.4 

THE MOORE-PENROSE PSEUDOINYERSE 

Fetus consider a rectangular n x m matrix A of rank r, having an SVD as A = U TV" . 
The Moore-Penrose pseudoinverse, denoted by A + , is an m x n matrix defined as 

A+ = VZ~ l U H 


(A-25) 
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where Z 1 is an r x r diagonal matrix with diagonal elements I /a, , i = 1,2 , ,r. 
We may also express A + as 

' 1 

A + = V — Vjuj 1 (A-26) 

We observe that the rank of A + is equal to the rank of A. 

When the rank r = m or r = n. the pseudoinverse A + can be expressed as 

A + = A H (AA H )~ l r = n 

A+ = (A H A)~ l A H r = m (A-27) 

A + = A~ { r = m = n 

These relations are equivalent to AA + = I „ and A + A = I m . 


APPENDIX B 


Error Probability for Multichannel Binary Signals 


In multichannel communication systems that employ binary signaling for transmitting 
information over the AWGN channel, the decision variable at the detector can be 
expressed as a special case of the general quadratic form 


in complex-valued Gaussian random variables. A, B, and C are constants; X k and Y k 
are a pair of correlated complex-valued Gaussian random variables. For the channels 
considered, the L pairs { X k , Y k } are mutually statistically independent and identically 
distributed. 

The probability of error is the probability that D < 0. This probability is evaluated 
below. 

The computation begins with the characteristic function, denoted by i/sdUv), of 
the general quadratic form. The probability that D < 0, denoted here as the probability 
of error P/,. is 


where p{D), the probability density function of D, is related to ^fo(jv) by the Fourier 
transform, i.e., 


L 


D = J2 i A \ x k\ 2 + B\Y k \ 2 + CX k Y* + C*X* k Y k ) 


(B — 1) 



(B-2) 



Flence, 



(B-3) 
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Let us interchange the order of integration and carry out first the integration with respect 
to D. The result is 


j_ r +je toUv) dv 

2 ? T j J—oo+js V 


(B-4) 


where a small positive number s has been inserted in order to move the path of integration 
away from the singularity at v = 0 and which must be positive in order to allow for the 
interchange in the order of integration. 

Since D is the sum of statistically independent random variables, the characteristic 
function of D factors into a product of L characteristic functions, with each function 
corresponding to the individual random variables d k , where 

d k = A\X k \ 2 + B\Y k \ 2 + CX k YZ + C*X* k Y k 


The characteristic function of d k is 


fd k (jv) = 


ViV^ 

( V + jv i)(u 


jv 2 ) 


exp 


UiU 2 ( - 1 raik + jvoilk) 
( v + jv j)(u - jv 2 ) 


(B-5) 


where the parameters i>i, v 2 ,a\k, and a 2k depend on the means X k and Y k and the 
second (central) moments [i xx , /i yy , and fi xy of the complex-valued Gaussian variables 
X k and Y k through the following definitions (|C| 2 — AB > 0): 


Vl 



1 

\n xy \ 2 )(\C\ 2 - AB) 


V 2 = 1 U ) 2 H 7 rr h W 

V 4(HxxByy-\B X y\ 2 )(\C\ 2 -AB) 

A/^XX H - B fly y “h C /A X y “h C \A X y 

ID — — 1 — 

4(tA Xxt Ayy-\fA X y\ 2 )(\C\ 2 -AB) 

otik = 2(|C| 2 - AB)(\X k \ 2 iAyy + \Y k \ 2 n xx - X* k Y k iA xy - X k Y*fi* xy ) 
a 2 k = A\x k \ 2 + B\Y k \ 2 + CX*Y k + C*X k Y* 
n xy = \E[{X k - Xk)(Y k - Y k f\ 


Now, as a result of the independence of the random variables d k , the characteristic 
function of D is 

L 

= \\^d k (jv) 

(B-7) 


k= 1 


iAdO'v) = 


(V\V 2 ) 


( V + jVi) L (v - jv 2 ) L 


exp 


viv 2 (jva 2 - v 2 a 1) 
(v + jv j)(u - jv 1) 


where 


k=\ 


«2 = X, a2k 

k= 1 


(B-8) 
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The result B-7 is substituted for juAjv) in Equation B-4, and we obtain 


Ph = - 


r oo + je 


dv 


(VlVlY 

2nj J-oo+je V(v + jv i) L (v - jv 2 y 


exp 


V\v 2 (jva 2 - iraq) 


(u + jv i)(u - jv 2 ) 


This integral is evaluated as follows. 

The first step is to express the exponential function in the form 


exp -A i + 


jM 


jA 3 


v + jv i v — jv 2 , 

where one can easily verify that the constants A [; A 2 , and A3 are given as 

Aj = a l v l v 2 

A 2 = (aivi + a 2 ) 

ui + v 2 

V\v\ 

A 3 = (a\V 2 - a 2 ) 

Vi + v 2 


(B-9) 


(B— 10) 


Second, a conformal transformation is made from the v plane onto the p plane via 
the change in variable 


P = 


Vi v - JV 2 


v 2 v + jv 1 

In the p plane, the integral given by Equation B-9 becomes 

exp [uiW 2 (-2 q;iUiI '2 + a 2 Ui - a 2 v 2 )/(Vi + u 2 ) 2 ] 1 

(1 + v 2 /vi) 2L ~ l 2 Ttj 


Pb = 

where 


(B — 1 1) 


f(p)dp (B-12) 


, [1 + (v 2 /vi)p] 2L 1 

= Tn ^ ex P 

P L ( 1 - P) 


A 2 {v 2 /vi) A 3 (vi/v 2 ) 1 

P + 


fi + v 2 


Vl + V 2 p_ 


(B— 13) 


and T is a circular contour of radius less than unity that encloses the origin. 
The third step is to evaluate the integral 


[ f(P) dp = ^ 

LTZJ Jy 2jtj . 


x exp 


[1 +(V2/V 1 )P1 2L ~ 1 
P L ( 1 - P) 

A 2 (v 2 /v 1 ) A 3 (u i/u 2 ) 1 

p H 

Ul + v 2 Vi + u 2 p 


(B— 14) 


dp 


In order to facilitate subsequent manipulations, the constants a > 0 and b > 0 are 
introduced and defined as follows: 


1 2 A 3 (v i/v 2 ) 

2 a~= 


Wl + v 2 


1^2 _ A 2 (v 2 /vi) 
2 1,. _L 11, 


(B— 15) 


ni + v 2 


Appendix B: Error Probability for Multichannel Binary Signals 


1093 


Let us also expand the function [1 + (vi/v\ )p] 2L 1 as a binomial series. As a result, 
we obtain 


2L-1 


1 r ^ {2L - 1\ /u 2 x k 

/ r /«<*/>= E ' 


k = 0 


Vl 


1 „2 


(B— 1 6) 


2 7rj Jr p L ( 1 - p) 


exp | 2— + ±b z p ) dp 


1 u2. 


The contour integral given in Equation B-16 is one representation of the Bessel 
function. It can be solved by making use of the relations 


1 fa\ n 


I, Jab) = < 


1 


2Ttj\bJ Jr p n+l 
1 fb\ n 


exp ( - 1- \b 2 p\ dp 




where l,Jx ) is the nth-order modified Bessel function of the first kind and the series 
representation of Marcum’s Q function in terms of Bessel functions, i.e., 


QJa,b) = exp {-\(a 2 + b 2 )\ ( j) I, Jab) 


n = 0 


First, consider the case 0 < k < L — 2 in Equation B-16. In this case, the resulting 
contour integral can be written in the form 


1 


2^] J r p^'a - P ) cxp {ir + i " 2 ' ’) dp = e,o.i»ex p[|(« 2 +(. 2 )]+ E (£ 

" (B — 17) 

Next, consider the term k = L — 1 . The resulting contour integral can be expressed in 
terms of the Q function as follows: 


1 


2 nj Jr p( 1 - p) 


exp [2 1_ ^b 2 p J dp = QJa, fi)exp[2(o 2 + b 2 )\ (B— 18) 


tThis contour integral is related to the generalized Marcum Q function, defined as 

/•OO 

Q m (a,b)= / x(x/a) m_1 exp[- 2 (x 2 + a 2 )]I,„-i(ax)dx, m> 1 


in the following manner: 


Q m (a, b)exp[\(a 2 + b 2 )] = 


1 


1 


2 xj J r p"’( 1 - p) 


exp 


2 h \b 2 p I dp 


I n (ab) 
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Finally, consider the case L < k < 2L — 1 . We have 


1 

2 Ttj 


/ r f=7 exp (yr + i'’ 2 ")‘"' 


a\ n 

i) 1 »(«*) 

(B— 19) 

Collecting the terms that are indicated on the right-hand side of Equation B-16 and 
using the results given in Equations B-17 to B-19, the following expression for the 
contour integral is obtained after some algebra: 


= E 


I n (ab)= Qi(a,b)exp [\{a 2 + b 2 )] 


1 

2 Ttj 


Jr f(p ) dp = 



v \ 2L-1 

— ) {exp [\{a 2 +b 1 )]Q l {a,b) 


Io(ab)} 


L - 1 


I 0 (ab) ^2 


k = o 


/ 2L - 1 

V k 


L - 1 L— 1— « 

+ X] E 

«=t r=o 





(B-20) 


Equation B-20 in conjunction with Equation B-12 gives the result for the prob- 
ability of error. A further simplification results when one uses the following identity, 
which can easily be proved: 


exp 


VlV 2 


_Ol + V 2 ) 2 

Therefore, it follows that 


(— 2o' 1 U 1 l) 2 + ttiUi - a 2 v 2 ) 


P b = Qi(a, b) - I 0 (ab)exp[-\(a 2 + b 2 )] 

( 


I 0 (ab)exp[-^(a 2 + b 2 )] ^-4 ( 2L - 1 
(1 + v 1 /v 1 ) 2L ~ 1 


L - 1 


L— 1 —n 


^2l n (ab) ^2 


n= 1 


k = 0 
k 


f2 L - 1 

V k 


2L-l-k 


Pb = Qi(a,b) - 


V 2 /V 1 

1 + V 2 /V\ 


= exp[-2(o 2 + b 2 )} 


v_ 2 \ k + exp [~\{a 2 +b 2 )\ 


Vl 


(1 + u 2 /l>|) 


2L-1 


L > 1 


/ 0 (aZ?)exp[— j(a 2 + b 2 )] , 


L = 1 


(B— 21) 
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This is the desired expression for the probability of error. It is now a simple matter 
to relate the parameters a and b to the moments of the pairs ( X /. , Y /, }. Substituting for 
A 2 and A 3 from Equation B-10 into Equation B-15, we obtain 


a = 


b = 


2vjv2(cxiv 2 - a 2 ) 
(wi + v 2 ) 2 

Iviv^ioti i>i + a.2) 

(i>i + u 2 ) 2 


1/2 


1/2 


(B-22) 


Since v\, i' 2 , oi\, and a 2 have been given in Equations B -6 and B -8 directly in terms of 
the moments of the pairs and Y^, our task is completed. 


APPENDIX C 


Error Probabilities for Adaptive Reception 
of M-Phase Signals 


In this appendix, we derive probabilities of error for two- and four-phase signaling 
over an L-diversity-branch time-invariant Gaussian noise channel and for M - phase 
signaling over an L-diversity-branch Rayleigh fading additive Gaussian noise channel. 
Both channels corrupt the signaling waveforms transmitted through them by introducing 
additive white Gaussian noise and an unknown or random multiplicative gain and phase 
shift in the transmitted signal. The receiver processing consists of cross-correlating the 
signal plus noise received over each diversity branch by a noisy reference signal, which 
is derived either from the previously received information-bearing signals or from the 
transmission and reception of a pilot signal, and adding the outputs from all L -diversity 
branches to form the decision variable. 


■ C.l 

MATHEMATICAL MODEL FOR AN M-PHASE SIGNALING 
COMMUNICATION SYSTEM 


In the general case of M-phase signaling, the signaling waveforms at the transmitter 
are^ 

s n (t) = R e\si„(t)e j27lfet ] 

where 


sin(t) = g(t)ex p 


2n 

i — (n 
J M 



n = 1, 2, . . . , M, 0 <t<T (C-l) 


and T is the time duration of the signaling interval. 

Consider the case in which one of these M waveforms is transmitted, for the 
duration of the signaling interval, over L channels. Assume that each of the channels 
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tThe complex representation of real signals is used throughout. Complex conjugation is denoted by an 
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corrupts the signaling waveform transmitted through it by introducing a multiplicative 
gain and phase shift, represented by the complex-valued number gk, and an additive 
noise Zk(t). Thus, when the transmitted waveform is s/ n (t), the waveform received over 
the k\h channel is 


The noises {^(f)} are assumed to be sample functions of a stationary white Gaussian 
random process with zero-mean and autocorrelation function <p z (r) = AVo<5(t), where 
No is the value of the spectral density. These sample functions are assumed to be 
mutually statistically independent. 

At the demodulator, r^t) is passed through a filter whose impulse response is 
matched to the waveform g(t). The output of this filter, sampled at time t = T, is 
denoted as 


where £ is the transmitted signal energy per channel and Nk is the noise sample from the 
A; th filter. In order for the demodulator to decide which of the M phases was transmitted 
in the signaling interval 0 < t < T, it attempts to undo the phase shift introduced by 
each channel. In practice, this is accomplished by multiplying the matched filter output 
Xk by the complex conjugate of an estimate gk of the channel gain and phase shift. 
The result is a weighted and phase-shifted sampled output from the Ath-channcl lilter, 
which is then added to the weighted and phase-shifted sampled outputs from the other 
L — 1 channel filters. 

The estimate gk of the gain and phase shift of the Ath channel is assumed to be 
derived either from the transmission of a pilot signal or by undoing the modulation on 
the information-bearing signals received in previous signaling intervals. As an example 
of the former, suppose that a pilot signal, denoted by s p k(t), 0 < t < T, is transmitted 
over the A:th channel for the purpose of measuring the channel gain and phase shift. The 
received waveform is 

gkSpki 0 + z p k(t), 0 < t < T 

where z p k(t) is a sample function of a stationary white Gaussian random process with 
zero-mean and autocorrelation function (p p ( r) = AVo<5(r). This signal plus noise is 
passed through a filter matched to s p k(t). The lilter output is sampled at time t = T to 
yield the random variable X p k = 2£ p gk + N p k, where £ p is the energy in the pilot signal, 
which is assumed to be identical for all channels, and N p k is the additive noise sample. 
An estimate of gk is obtained by properly normalizing X p k, i.e., gk = gk + N p k/2£ p . 

On the other hand, an estimate of gk can be obtained from the information-bearing 
signal as follows. If one knew the information component contained in the matched 
filter output, then an estimate of gk could be obtained by properly normalizing this 
output. For example, the information component in the filter output given in Equa- 
tion C-3 is 28 gk exp[ /(2jr/ A7)(« — 1)], and, hence, the estimate is 


Hk(t) = gkSi„(t) + Zk(t), 0 < t < T, k = 1, 2, .... L 


(C— 2) 


X k = 28 g k exp j-(n- 1) + N k 


(C-3) 



K 

28 
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where N' k = Nf, exp[—j(2n/M)(n — 1)] and the PDF of N' k is identical to the PDF of 
Nk- An estimate that is obtained from the information-bearing signal in this manner 
is called a clairvoyant estimate. Although a physically realizable receiver does not 
possess such clairvoyance, it can approximate this estimate by employing a time delay 
of one signaling interval and by feeding back the estimate of the transmitted phase in 
the previous signaling interval. 

Whether the estimate of gk is obtained from a pilot signal or from the information- 
bearing signal, the estimate can be improved by extending the time interval over which 
it is formed to include several prior signaling intervals in a way that has been described 
by Price (1962a, b). As a result of extending the measurement interval, the signal-to- 
noise ratio in the estimate of g , t is increased. In the general case where the estimation 
interval is the infinite past, the normalized pilot signal estimate is 

OO / o o 

gk = gk + Y C i N pki h-£p Y c ‘ (C“4) 

( = 1 ' ! = 1 

where c, is the weighting coefficient on the subestimate of gk derived from the / th prior 
signal interval and N p ki is the sample of additive Gaussian noise at the output of the filter 
matched to s p k(t) in the / th prior signaling interval. Similarly, the clairvoyant estimate 
that is obtained from the information-bearing signal by undoing the modulation over 
the infinite past is 

OO / OO 

h = §k + J2 c ‘ Nk ‘ 28 Y c ‘ (c-5) 

1=1 ' ;=i 


As indicated, the demodulator forms the product between g% and Xk and adds this 
to the products of the other L — 1 channels. The random variable that results is 


z = Y x ksi = Y x k Y * k 

k= 1 k= 1 

= Zr + jZi 


(C — 6) 


where, by definition, Yk = gk, z r = Re(z), and z., = Im(z). The phase of z is the 
decision variable. This is simply 


6 = tan 1 f — ) = tan 1 


ImPT^y; /Ref Yl X k Y k 


\k= 1 


U=1 


(C-7) 


■ C.2 

CHARACTERISTIC FUNCTION AND PROBABILITY DENSITY 
FUNCTION OF THE PHASE 0 

The following derivation is based on the assumption that the transmitted signal phase 
is zero, i.e., n = 1. If desired, the PDF of 6 conditional on any other transmitted signal 
phase can be obtained by translating p(Q) by the angle 2n(n — 1 )/M. We also assume 
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that the complex-valued numbers {g^}, which characterize the L channels, are mutu- 
ally statistically independent and identically distributed zero-mean Gaussian random 
variables. This characterization is appropriate for slowly fading Rayleigh channels. 
As a consequence, the random variables ( X k , Y k ) are correlated, complex-valued, zero- 
mean, Gaussian, and statistically independent, but identically distributed with any other 
pair (X t , Y t ). 

The method that has been used in evaluating the probability density p(9) in the 
general case of diversity reception is as follows. First, the characteristic function of the 
joint probability distribution function of z r and z,-, where z r and z, are two components 
that make up the decision variable 9, is obtained. Second, the double Fourier transform 
of the characteristic function is performed and yields the density p(z r ~ Zi )■ Then the 
transformation 

r = \Jz; + ZJ, 9 = tan -1 (C-8) 

yields the joint PDF of the envelope r and the phase 9. Finally, integration of this joint 
PDF over the random variable r yields the PDF of 9. 

The joint characteristic function of the random variables z r and z, can be expressed 
in the form 


iKM, jvi) = 


m xx m yy {\ - |/x| 2 ) 


Vi-J 


2\p,\ cos e 


^m xx m yy (l - |/xp) 


(C— 9) 


+ 



2|/x| sins 
s /m xx m yy (1 - 



2 


+ 


4 

m xx m yy {l 


ImI 2 ) 2 


where, by definition, 


m xx = E( \X k \ 2 ), 

identical for all k 

m yy = E(\Y k \ 2 ), 

identical for all k 

m xy = E(X k Y*), 

identical for all k 

l: 

II 

S-i 

II 

\p.\e~ JS 


y/m xx m yy 


(C-10) 


The result of Fourier-transforming the function i // ( j v \ , jv 2 ) with respect to the 
variables V\ and V 2 is 


P(Zr,Zi) 


(l~lril 2 ) L 
(L - 



x exp[|/x|(z r cos s + Zi sin s)]A' L _ ] 



(C-ll) 
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where K n {x) is the modified Hankel function of order n. Then the transformation of 
random variables, as indicated in Equation C-8 yields the joint PDF of the envelope r 
and the phase 8 in the form 

(l — \p\ 2 ) L , 

p (r , 8) = (L _ [y nl L r eX P [|/x|r COs{9 - (C-12) 

Now, integration over the variable r yields the marginal PDF of the phase 8. We have 
evaluated the integral to obtain p{8) in the form 


2\L 


p(0) = 


2it(L - 1)! \ 

| /x | cos (8 — s ) 


9 


L— 1 


db L - 


1 


+ 


b — \p\ 2 cos 2 (8 — e) 
\p\ cos (8 — e ) 


[b — \p\ 2 cos 2 (0 — e)] 3 / 2 

In this equation, the notation 


cos — 


b 1 / 2 


d L 


b= 1 


(C— 13) 


b= 1 


denotes the Lth partial derivative of the function fib, /i) evaluated at h — 1. 


■ C.3 

ERROR PROBABILITIES FOR SLOWLY FADING RAYLEIGH CHANNELS 

In this section, the probability of a character error and the probability of a binary 
digit error are derived for M - phase signaling. The probabilities are evaluated via the 
probability density function and the probability distribution function of 8. 

The probability distribution function of the phase In order to evaluate the prob- 
ability of error, we need to evaluate the definite integral 

rd 2 

P{8\ <8 <8 2 )= / p{8) dd 

where 8\ and 8o are limits of integration and p(8) is given by Equation C-13. All 
subsequent calculations are made for a real cross-correlation coefficient p.. A real- 
valued p implies that the signals have symmetric spectra. This is the usual situation 
encountered. Since a complex-valued p causes a shift of e in the PDF of 8, i.e., e is 
simply a bias term, the results that are given for real p can be altered in a trivial way to 
cover the more general case of complex- valued p. 

In the integration of p{8), only the range 0 < 8 < it is considered, because p(8) 
is an even function. Furthermore, the continuity of the integrand and its derivatives 
and the fact that the limits 8\ and 8i are independent of b allow for the interchange of 
integration and differentiation. When this is done, the resulting integral can be evaluated 
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quite readily and can be expressed as follows: 




'0i 


2?r(L - 1)! 


a 


L— 1 


1 


db L 1 I b — p 2 


p- v/l - ( b/p 2 - 1)jc 2 j 


b 1 / 2 


cot x (C-14) 


— cot 


ti 1/2 //r 


v/l — (Z?//x 2 — I )* 2 


*2 


*1 6=1 


where, by definition, 


— /X COS 0; 


\Jb — p 2 ( cos 0,) 2 


i = 1,2 


(C-15) 


Probability of a symbol error The probability of a symbol error for any M -phase 
signaling system is 


P e = l[ p(9)d0 

Jn/M 


When Equation C-14 is evaluated at these two limits, the result is 


(— i) i_I (i - n z ) L a 


2 \L ai-l 


P, = 


1 r(L - 1)! 


db L ~ 


1 


b — ii 2 |_ M 


1) 


Ii 


\/b — n 2 cos 2 (7t/M) 


cot 


— /2COS(7r/M) \ 

\/& — /x 2 cos 2 (7t/M) y 


(C-16) 


6=1 


Probability of a binary digit error First, let us consider two-phase signaling. In 
this case, the probability of a binary digit error is obtained by integrating the PDF p{6) 
over the range < 0 < 3n. Since p(0) is an even function and the signals are a priori 
equally likely, this probability can be written as 

P 2 = 2 f p(9)d6 

Jtt/2 


It is easily verified that G\ = \n implies v, = 0 and 0 2 = n implies x 2 = p/ \ f b — /i 2 . 
Thus, 


- p}) L d L ~ l 

1 

p 

'P 

1 

Q2> 

1 

b — p? 

b l / 2 (b — p 2 )\ 


(C-17) 


After performing the differentiation indicated in Equation C-17 and evaluating the 
resulting function at b = 1, the probability of a binary digit error is obtained in 
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the form 


L—l 


Pi = - 
2 


1 -/*£ 


k=0 


2 k 


1 — p 2 


(C-18) 


Next, we consider the case of four-phase signaling in which a Gray code is used to map 
pairs of bits into phases. Assuming again that the transmitted signal is sn(t), it is clear 
that a single error is committed when the received phase is^7r <6 < r, and a double 
error is committed when the received phase is <6 < it. That is, the probability of 
a binary digit error is 


PAb 



p( 6 )d 6 + 2 



P ( 6 )d 0 


It is easily established from Equations C-14 and C-19 that 


(C-19) 


(-l^-'a - p 2 ) L d L ~ l 

1 

p 

2 (L - 1)! db 1 - 1 

b — p 2 

(b — p 2 )( 2 b — pt 2 ) 1 / 2 J 


Hence, the probability of a binary digit error for four-phase signaling is 



(C-20) 


Note that if one defines the quantity p = p/y / 2 — pi 2 , the expression for Pyi, in terms 
of p is 


L - 1 


P4b = 




k = 0 



(C-21) 


In other words, P 41 , has the same form as Pi given in Equation C-18. Furthermore, note 
that p, just like p, can be interpreted as a cross-correlation coefficient, since the range 
of p is 0 < p < 1 for 0 < p < 1. This simple fact will be used in Section C.4. 

The above procedure for obtaining the bit error probability for an /V/- phase signal 
with a Gray code can be used to generate results for M = 8, 16, etc., as shown by 
Proakis (1968). 


Evaluation of the cross-correlation coefficient The expressions for the probabil- 
ities of error given above depend on a single parameter, namely, the cross-correlation 
coefficient p . The clairvoyant estimate is given by Equation C-5, and the matched filter 
output, when signal waveform J/i(f) is transmitted, is Xf- = 2£g ^ + N^. Hence, the 
cross-correlation coefficient is 


+ 1 ) (r' + ») 


p = 


(C-22) 
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where, by definition, 


v 



Yc 


£ 

No 


E{\g k \ 2 ), 


k= 1,2, ...,L 


(C-23) 


The parameter v represents the effective number of signaling intervals over which the 
estimate is formed, and y c is the average SNR per channel. 

In the case of differential phase signaling, the weighting coefficients are c\ = 1, 
c, = 0 for i 1. Hence, v = 1 and /x = j> c /(l + y) c ). 

When v = oo, the estimate is perfect and 


lim /x = 



Finally, in the case of a pilot signal estimate given by Equation C-4, the cross- 
correlation coefficient is 

. 1 - 1/2 


/X = 


1 + 


r + 1 

r?t 


1 + 


r+1 


vy, 


where, by definition. 


Yt = ~ 'yE(\g k \ 2 ) 
A o 


£t — £ + £p 

r = £/£ p 

The values of /x given above are summarized in Table C-l. 


(C-24) 


TABLE C-l 

Rayleigh Fading Channel 


Type of estimate 

Cross-correlation coefficient // 

Clairvoyant estimate 



\J ' (?c 1 + l) (Kc 1 + v) 

Pilot signal estimate 

•Jrv 

(r + 1) \lL + r + l) (y, + r + l) 

Differential phase signaling 

Yc 

Yc + 1 

Perfect estimate 



■ C.4 

ERROR PROBABILITIES FOR TIME-INVARIANT 
AND RICEAN FADING CHANNELS 
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In Section C.2, the complex-valued channel gains {g^} were characterized as zero-mean 
Gaussian random variables, which is appropriate for Rayleigh fading channels. In this 
section, the channel gains {g^} are assumed to be nonzero-mean Gaussian random 
variables. Estimates of the channel gains are formed by the demodulator and are used 
as described in Section C.l. Moreover, the decision variable 9 is defined again by 
Equation C-7. However, in this case, the Gaussian random variables X* and Y^, which 
denote the matched filter output and the estimate, respectively, for the ft It channel, have 
nonzero-means, which are denoted by X * and ?/. Furthermore, the second moments 
are 


m xx = E(\Xk — V ;. | 2 ) , identical for all channels 

m yy = E{\Yk — Yk | 2 ), identical for all channels 

m xy = E [(A 7 /. — Xk)(Yj* — Y £ )] , identical for all channels 

and the normalized covariance is defined as 


m xy 

M = , 

V m AA- m VV 


Error probabilities are given below only for two- and four-phase signaling with this 
channel model. We are interested in the special case in which the fluctuating component 
of each of the channel gains {g^} is zero, so that the channels are time-invariant. If, in 
addition to this time invariance, the noises between the estimate and the matched filter 
output are uncorrelated, then /x = 0. 

In the general case, the probability of error for two-phase signaling over L sta- 
tistically independent channels characterized in the manner described above can be 
obtained from the results in Appendix B. In its most general form, the expression for 
the binary error rate is 


Qi(a, b ) - / 0 (a6)exp[- i(a 2 - b 2 )] 


Io(ab)exp[—}j(a 2 + b 2 )] ^-4 

[ 2/(1 - / Gp - 1 ^ 

exp[-j(a 2 + b 2 )] 


k=0 


[ 2/(1 - 

L— 1 L-l-n 

x ^2 Iniflb) ^2 


2 L - 1 
k 


2 L - 1 
k 




1 — ix 


1 + ft 
l — /X 


k= 1 k=0 

Qi(a, b) — 1(1 + ix)Io(ab)exp[—Ua 2 + b 2 )] (L = 1) 


1 + ft 
l — /x 


2L—l—k 


(L> 2 ) 


(C— 25) 
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where, by definition, 


a = 


b = 


Q\(a, b) = 




i(a 2 + x 2 )]Io(ax) dx 


(C— 26) 


I n (x) is the modified Bessel function of the first kind and of order n. 

Let us evaluate the constants a and b when the channel is time-invariant, ji = 0, 
and the channel gain and phase estimates are those given in Section C.l. Recall that 
when signal s\(t) is transmitted, the matched filter output is X/, = 2£gk + Nf.. The 
clairvoyant estimate is given by Equation C-5. Hence, for this estimate, the moments are 
Xk = 2 £ g k, Yk = gk, m xx = 4 £Nq, andm >7 = Nq/£v, where £ is the signal energy, A^o 
is the value of the noise spectral density, and v is defined in Equation C-23. Substitution 
of these moments into Equation C-26 results in the following expressions for a and Ir. 





(C-27) 


This is a result originally derived by Price (1962). 

The probability of error for differential phase signaling can be obtained by setting 
v = 1 in Equation C-27. 

Next, consider a pilot signal estimate. In this case, the estimate is given by Equation 
C-4 and the matched filter output is again Xk = 2 £gk + Nk- When the moments are 
calculated and these are substituted into Equation C-26, the following expressions for 
a and b are obtained: 


a = 




b = 



(C— 28) 


£t — £ + £p 

r = £/£ p 


where 
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Finally, we consider the probability of a binary digit error for four-phase signaling 
over a time-invariant channel for which the condition /r = 0 obtains. One approach 
that can be used to derive this error probability is to determine the PDF of 9 and then 
to integrate this over the appropriate range of values of 9. Unfortunately, this approach 
proves to be intractable mathematically. Instead, a simpler, albeit roundabout, method 
may be used that involves the Laplace transform. In short, the integral in Equation 14.4- 
14 of the text that relates the error probability Pi(Yb) in an AWGN channel to the error 
probability Pi in a Rayleigh fading channel is a Laplace transform. Since the bit error 
probabilities Pi and P^, for a Rayleigh fading channel, given by Equations C-18 and 
C-21, respectively, have the same form but differ only in the correlation coefficient, 
it follows that the bit error probabilities for the time-invariant channel also have the 
same form. That is. Equation C-25 with ji = 0 is also the expression for the bit error 
probability of a four-phase signaling system with the parameters a and b modified to 
reflect the difference in the correlation coefficient. The detailed derivation may be found 
in the paper by Proakis (1968). The expressions for a and b are given in Table C-2. 

■ TABLE C-2 

Time-Invariant Channel 


Type of estimate 

Clairvoyant estimate 
Differential phase signaling 

Pilot signal estimate 


Clairvoyant estimate 

Differential phase signaling 
Pilot signal estimate 


a 


Two-phase signaling 


■\Zlto\Vv- 1 | 

0 



Four-phase signaling 


\J ) Yh | v + 1 + i/v~ + 1 

- \J V + 1 - \/v 2 + 1 | 

y/l 7b (\/2 + V2 - sjl- s/2) 



b 


+ 1) 

V2 Yb 




APPENDIX D 


Square Root Factorization 


Consider the solution of the set of linear equations 

RnC N = U N 


(D — 1) 


where R N is an N x N positive-definite symmetric matrix, is an fV-dimensional 
vector of coefficients to be determined, and Un is an arbitrary N -dimensional vector. 
The equations in D-l can be solved efficiently by expressing Rn in the factored form 

Rn = S N D N S r N (D-2) 

where Sn is a lower triangular matrix with elements (,v,; } and D is a diagonal matrix 
with diagonal elements { cl f}. The diagonal elements of Sn are set to unity, i.e., .v,, = 1. 
Then we have 

j 

nj = d^SikdkSjk, 1 < j < i - 1, i >2 (D-3) 

k=l 

r 1 1 = 


where { r, y } are the elements of R y . Consequently, the elements {.s,/. } and {dk} are 
determined from Equation D-3 according to the equations 

d i = r\ i 

y-i 

Sijdj = r tj - Sikd k Sjk , 1 2 < i < N (D-4) 

k=\ 

1 — 1 

di = r u - Y s a d k ’ 2 < i < N 

k= 1 

Thus, Equation D-4 defines Sn and D N in terms of the elements of R y . 

The solution to Equation D-l is performed in two steps. With Equation D-2 sub- 
stituted into Equation D-l we have 

Sn D n S' n C n = Un 
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Let 


Y n 


DnS'nC 


n 


(D-5) 


Then 

First we solve Equation D-6 for Y 


S n Y n = U N (D-6) 

n- Because of the triangular form of Sn, we have 


yi = hi 

i — 1 

yi = Hi -Y.Sjjyj, 2 <i < N 

j = i 

Having obtained Y n , the second step is to compute C ; y . That is, 

D N S r N CN = Y n 
S' n C n = D N l Y N 

Beginning with 


cn = yN/d N 


(D-7) 


(D-8) 


the remaining coefficients of Cn are obtained recursively as follows: 

N 

d = y i- J2 S J‘ C J’ l<i<N-l (D-9) 

d ‘ j=i + 1 

The number of multiplications and divisions required to perform the factorization 
of R n is proportional to N '\ The number of multiplications and divisions required to 
compute Cn, once Sn is determined, is proportional to N 2 . In contrast, when Rn is 
Toeplitz the Levinson-Durbin algorithm should be used to determine the solution of 
Equation D-l, since the number of multiplications and divisions is proportional to N 2 . 
On the other hand, in a recursive least-squares formulation, Sn and D v are not com- 
puted as in Equation D-3, but they are updated recursively. The update is accomplished 
with N 2 operations (multiplications and divisions). Then the solution for the vector C n 
follows the steps of Equations D-5 to D-9. Consequently, the computational burden of 
the recursive least-squares formulation is proportional to N 2 . 
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for fading channels, 957-960 
for pulsed interference, 

787-791 

CWEF (conditional weight 

enumeration function), 416 
Cyclic codes, 447 
CRC, 453 
decoding, 458 
encoding, 455 
generator polynomial, 448 
Golay, 460 
Hamming, 460 
message polynomial, 449 
parity check polynomial, 450 
shortened, 452 
systematic, 453 
Cyclic equalization, 694 
Cyclic redundancy check (CRC) 
codes, 453 

Cyclic subgroup, 482 
Cyclostationary random 
process, 70 


D transform, 493 
Data compression, 1, 335-354 
lossless, 335-348 
lossy, 348-354 

Decision-feedback equalizer ( see 
Equalizers, 
decision-feedback), 
661-665, 705-706 
Decision region, 163 
Decoding, 

Berlekamp-Massey, 469 
Fano algorithm, 525 
feedback, 529-531 
hard decision, 428 
iterative, 478, 548 
Meggit, 460 
sequential, 525-528 
soft decision, 424 
stack algorithm, 528-529 
turbo, 552 
LDPC, 570 

Viterbi algorithm, 243-244, 
Degrees of freedom, 75 
Delay distortion, 598-599 
Delay power spectrum, 834 
Demodulation, 24 
Demodulation and detection, 201 
carrier recovery for, (See Carrier 
phase estimation) 
coherent 

comparison of, 226-229 
of binary signals, 173-177 
of biorthogonal signals, 
207-209 

of orthogonal signal, 203-207 
of PAM signals, 188-190 
of PSK signals, 190-195 
of QAM signals, 196-200 
optimum, 201-203 
correlation type, 177-178 
of CPM, 243-258 
performance, 251-258 
for intersymbol interference, 
623-628 

matched filter-type, 178-182 
maximum likelihood, 163 
maximum-likelihood sequence, 
623-628 

noncoherent, 210-224 
of binary signals, 219-221 
of M- ary orthogonal signals, 
216-219, 741-743, 
861-865 

multichannel, 737-743 
optimum, 212-214 
of OFDM, 749 
Density of a lattice, 236 
Detector 

decorrelating, 1043-1045 
envelope, 214 
inverse channel (ICD), 970 
maximum-likelihood 
(MLD), 970 

MMSE, 970, 1046-1047 
minimum distance, 171 
nearest neighbor, 171 
nonlinear, 973-974 
optimal noncoherent, 212-214 
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single user, 1042-1043 
sphere, 973 

Differential encoding, 1 1 5 
Differential entropy, 349 
Differential phase-shift keying 
(DPSK), 221 
Differentially encoded 
PSK, 195 

Digamma function, 909 
Digital communication system 
model, 1-3 

Digital modulation, 95 
Digital modulator, 2 
Digital signaling, 95 
Dimensionality theorem, 

227 

Direct sequence (See Spread 
spectrum signals) 

Dirty paper precoding, 1054 
Discrete memoryless source 
(DMS), 331 

Discrete-memoryless channel 
(DMC), 356 

Discrete-time AWGN, 358 
Discrete-time AWGN channel 
capacity, 365 

Discrete-time binary-input channel 
capacity, 362 

Distance (see Block codes; 
Convolutional codes) 
effective, 927 
enumerator function, 1 85 
Euclidean, 35 
Hamming, 414 
metric, 173 
product, 925 

Distortion (see Channel distortion) 
Hamming, 354 
squared-error, 350 
Distortion-rate function, 352 
Diversity 
antenna, 851 
frequency, 850 
gain, 996-997 
order, 852, 927 
performance of, 85 1-859 
polarization, 851 
RAKE, 851 
signal space, 928 
time, 85 1 

DMC (see Discret Memoryless 
Channel) 

DMS (see Discret Memoryless 
Source) 

Double-sideband (DSB) 

PAM, 100 
DPSK, 221 

error probability, 223 
DSB, 100 
Dual code, 412 
Dua\-k codes, 537-540 
Duobinary signal, 610 

e-outage capacity, 907 
Early-late gate synchronizer, 
318-321 

Effective antenna area, 262 
Effective distance, 927 


Effective radiated power, 260-261 
Eigenvalue, 29, 1086 
Eigenvector, 29, 1086 
Elias bound, 443 
Encoder 

catastrophic, 509 
convolutional, 402, 492 
for cyclic codes, 455 
inverse, 508 
turbo, 549 

Encoding (see Block codes; 

Convolutional codes) 
Energy, 25 
average, 97 
per bit, 
average, 97 
Entropy, 333 
chain rule, 335 
conditional, 334 
differential, 349 
joint, 334 
Entropy rate, 337 
Envelope detection, 214 
Envelope of a signal, 23 
Equivalent codes, 412 
Equivalent convolutional 
encoders, 506 

Equalizers (See also Adaptive 
equalizers) 

at transmitter, 668-669 
decision-feedback, 661-665, 

705- 706 

adaptive, 689-731 
examples of performance, 
662-665 

for MIMO channels, 979-981 
of trellis-coded signals, 

706- 708 

minimum MSE, 663 
predictive form, 665-667 
linear, 640-649 
adaptive, 689-693 
baseband, 658-659 
convergence of MSE 
algorithm, 695-696 
cyclic equalization, 694 
error probability, 651-655 
examples of performance, 
651-655 

excess MSE, 696-697 
for MIMO channels, 975-979 
fractionally spaced, 655-658 
LMS (MSE) algorithm, 
691-693 

mean-square error (MSE) 
criterion, 645-655 
minimum MSE, 647-648 
output SNR for, 648 
passband, 658-659 
peak distortion, 641 
peak distortion criterion, 
641-645 

phase-splitting, 659 
zero-forcing, 642 
iterative equalization/decoding, 
671-673 

maximum a posteriority 
probability (MAP), 291 


maximum -likelihood sequence 
estimation, 623-625, 
reduced-state, 669-67 1 
self-recovering (blind), 721-731 
with trellis-coded modulation, 
706-708 

using the Viterbi algorithm, 
628-631 

channel estimator for, 
703-705 

performance of, 631-639 
reduced complexity, 

669-671 

reduced-state, 669-67 1 
erfc, 44 

Ergodic capacity, 900, 905-906, 
985-987 

Error correction, 900 
Error detection, 432 
Error floor, 551 
Error probability, 

16QAM, 186, 200 
ASK, 189 

binary antipodal signaling, 174 
binary equiprobable 
signaling, 174 
binary orthogonal 
signaling, 176 
biorthogonal signaling, 208 
bit, 164,417 
block, 417 
DPSK, 223 

for hard-decision decoding, 
945-946 

for soft-decision decoding, 

943_944 

FSK, 205 

lower bound to, 1 86 
M-ary PSK, 190-194 
for Rayleigh fading, 859-861, 
1100-1103 

for Ricean fading, 1 104—1 105 
for AWGN channel, 1106 
message, 164 

multichannel binary symbols, 
739-741, 1090-1095 
orthogonal signaling, 205 
noncoherent detection, 216 
pairwise, 184, 372, 418, 

922, 928 
PAM, 189 
QAM, 198 
QPSK, 199 
symbol, 164 
union bound, 182 
word, 417 
Estimate 
biased, 323 
clairvoyant, 1098 
consistent, 324 
efficient, 324 
pilot signal, 1098 
unbiased, 323 

Estimate of phase (See Carrier 
phase estimation) 
Estimation 

maximum-likelihood, 291, 
296-298, 321-322 


of carrier phase, 295-315 
of signal parameters, 290 
of symbol timing, 290 
of symbol timing and carrier 
phase, 321-322 
performance of, 323-326 
Euclidean distance, 35 
Euler’s constant, 909 
Excess bandwidth, 607 
Excess MSE, 696-697 
Excision of narrowband 

interference, 791-796 
linear, 792-796 
nonlinear, 796 
EXIT charts, 555 
Exponential random variable, 46 
Expurgated codes, 447, 

950-951 

Extended codes, 447 
Extended Golay code, 424 
Extension field, 404 
Extrinsic information, 552 
Extrinsic L-value, 552 
Eye pattern, 603 

Factor Graphs, 558 
Fading, 8, 830-844 
figure, 52 

Fading channels (See also 
Channels), 830-890 
coding for, 899-960 
ergodic capacity, 900, 905-906, 
985-987 

outage capacity, 900, 906, 907, 
900, 987-990 
propagation models for, 
842-843 

Feedback decoding, 529-531 
FH spread spectrum signals (see 
Spread spectrum signals), 

Field 

characteristic, 404 
extension, 404 
finite, 403 
Galois, 403 
ground, 404 

minimal polynomial of an 
element, 408 
order of an element, 407 
primitive element, 407 
Figure of merit 
baseline, 239 
constellation, 238 
Filtered multitone (FMT) 
modulation, 754 

Filters, 

matched, 178-182 
whitening, 627 
Finite fields, 403 
Finite-state channels, 903 
capacity, 903-905 
Fire codes, 475 
First-event error, 502 
First-event error probability, 

513 

Fixed weight codes, 411, 

949-953 

Fixed-length source coding, 339 


1146 


Index 


Folded spectrum, 644 
Forward recursion, 543 
Free Euclidian distance, 577 
Free-space path loss, 262 
Frequency diversity, 850 
Frequency range 
wireline channels, 5 
wireless (radio) channels, 6 
Frequency division multiple access 
(FDMA), 1029 
capacity of, 103 1-1 032 
Frequency domain coding, 
942-960 

Frequency hopped (FH) spread 
spectrum, 802-804 
Frequency support, 20 
Frequency-shift keying (FSK), 
109-110 

continuous-phase (CPFSK), 
116-118 

error probability, 205 
noncoherent detection, 215 
power density spectrum, 154 
Frobenius norm, 982 
Fundamental coding gain, 586 
Fundamental volume of a 
lattice, 233 

Galois fields, 403 

minimal polynomial, 464 
subfield, 483 
Gamma function, 45 
complementary, 91 1 
Digamma function, 909 
Gamma random variable, 46 
Gaussian minimum-shift keying 
(GMSK), 118 
Gaussian noise, 10 
Gaussian random process, 10, 68 
Gaussian random variable, 41 
Generalized RAKE demodulator, 
880-882 
Generator matrix 
lattice, 231 

of linear block codes, 412 
of space-time block code, 

1006 

transform domain, 495 
Generator polynomial, 448, 464 
Gilbert- Varsharmov bound, 443 
Girth of a graph, 560 
GMSK, 118, 127 
Golay codes, 424, 460 
extended, 424 
ternary, 442 
Gold sequences, 799 
Gram-Schmidt procedure, 29 
Graphs, 558-568 
bipartite, 559 
constraint nodes, 561 
cycle-free, 560 
cycles, 560 
factor, 558 
girth, 560 

global function, 56 1 
local functions, 561 
Tanner, 558 
variable nodes, 560 


Gray coding, 100 
Gray labeling, 939 
Ground field, 404 
Group 

Abelian, 403 
identity element, 404 

Hadamard codes, 423, 951-953 
Hamming bound, 441 
Hamming codes, 420, 460 
Hamming distance, 414 
Hamming distortion, 354 
Hard decision decoding, 
of block codes, 428^436 
of convolutional codes, 509-516 
Hata model, 843 
Hermite parameter, 233 
Hermitian matrix, 65, 1085 
Hermitian symmetry, 19 
Hermitian transpose of a matrix, 28 
Hexagonal lattice, 230 
Hilbert transform, 22 
Homogeneous Markov chains, 72 
Huffman coding, 342-346 

Identity element, 404 
iid random variables, 45 
Illumination efficiency factor, 262 
Impulse noise, 601 
Impulse response, 

for bandpass systems, 27 
In-phase component, 22 
Inequality 

Cauchy-Schwarz, 29-30 
Kraft, 340 
Markov, 56 
triangle, 29-30 
Information sequence, 1, 401 
Information source 

discrete memoryless, 33 1 
memoryless, 331 
stationary, 331 
Inner code, 479 
Inner product, 26, 28, 30 
Input-output weight enumeration 
function (IOWEF), 416 
Instantaneous codes, 340 
Interference margin, 774 
Interleaver 
block, 476 
convolutional, 476 
gain 552 

uniform, 480-48 1 
Interleaving, 476-477 
Intersymbol interference, 599-600, 
603-604 

controlled {see Partial response 
signals), 609-61 1 
discrete-time model for, 626 
equivalent white noise filter 
model, 627 

optimum demodulator for, 
623-628 

Inverse channel detector 
(ICD), 970 
Inverse filter, 642 
Irreducible Markov chains, 73 
Irreducible polynomial, 405 


Irregular LDPC, 570 
Irrelevant information, 166 
Iterative decoding, 478, 548-558 
error floor, 551 
EXIT charts, 555 
turbo cliff region, 553 
waterfall region, 553 

Jakes’ model, 838-839 
Jensen’s inequality, 386 
Joint entropy, 334 
Jointly Gaussian random 
variables, 54 

Jointly wide-sense stationary 
processes, 54 

Kalman (RLS) algorithm, 

711-714 

Kalman gain vector, 712 
Karhunen-Loeve expansion, 76 
Kasami sequences, 799 
Kissing number of a lattice, 232 
Kolmogorov- Wiener filter, 13 
Kraft inequality, 340 

Labeling 
Gray, 939 
set portioning, 939 
Lattice 

coding gain, 233 
coset, 584 
density, 236 
equivalent, 231 
filter, 716-721 
fundamental volume, 233 
generator matrix, 23 1 
Hermite parameter, 233 
hexagonal, 230 
kissing number, 232 
minimum distance, 232 
multidimensional, 234 
multiplicity, 232 
recursive least squares, 708, 715 
Schlafli, 234 
Sublattice, 234 
Voronoi region, 232 
Law of large numbers (LLN), 63 
LDPC (low density parity check 
codes), 568-571 
code density, 569 
decoding, 570 

degree distribution polynomial, 
570 

irregular, 570 
regular, 569 
Tanner graph, 569 
Least-squares algorithms, 710-720 
Lempel-Ziv algorithm, 346-348 
Lengthened codes, 446 
Levinson-Durbin algorithm, 
692,716 

Likelihood function, 292 
Linear block codes, 400-490 
Linear equalization {see 
Equalizers, linear) 
Linear-feedback shift-register, 

maximum length, 798-799 
Linear filter channel, 1 1 


Linear modulation, 110 
Linear prediction, 716 
backward, 7 1 8 
forward, 717 
residuals, 718 

Linear time- varying channel, 1 1 
Linearly independent signals, 30 
Link budget analysis, 261-265 
Link margin, 246 
LLN {see law of large numbers) 
Log-APP (log a posteriori 
probability), 546 
Log-MAP (log maximum a 

posteriori probability), 546 
Lognormal random variable, 54 
Lossless data compression, 335 
Lossless source coding theorem, 
336 

Lossy data compression, 335 
Low density parity check codes 
{see LDPC) 

Lowpass equivalent, 22 
Lowpass signal, 20 
Low probability of intercept, 
778-779 

Mac Williams identity, 415 
MAP (maximum a posteriori 
probability), 162-163, 

291 

Mapping by set partitioning, 572 
Marcum’s (7-function, 47 
generalized, 47 
M - ary modulation, 2 
Markov chains, 71-74 
aperiodic states, 73 
equilibrium probabilities, 73 
ergodic, 73 
homogeneous, 72 
irreducible, 73 
period of state, 73 
state, 72 

state probability vector, 72 
state transition matrix, 72 
stationary probabilities, 73 
steady-state probabilities, 73 
Markov inequality, 57-58 
Matched filter, 178-182 
frequency domain, 179 
receiver, 178 
Matrix 

condition number, 1088 
eigenvalue, 1086 
eigenvector, 1086 
generator, 412-413 
Hermitian, 65 
Hermitian transpose, 28 
norm, 1088 
orthogonal, 231 
parity check, 412-413 
rank, 1085 
singular values, 1087 
skew-Hermitian, 65 
symmetric, 1085 
trace of, 1085 
transpose, 28 

Max-Log- APP algorithm, 548 
Max-Log-MAP algorithm, 548 
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Maximal ratio combiner, 852 
Maximum a posteriori probability 
( see MAP), 

Maximum-distance separable 
codes, 440 

Maximum free distance codes, 516 
tables of, 517-520 
Maximum-length shift register 
codes, 461, 798-799 
Maximum likelihood, 

parameter estimation, 290-291, 
321-322 

for carrier phase, 292-298 
for joint carrier and symbol, 
321-322 

for symbol timing, 315-321 
performance of, 323-324 
Maximum-likelihood (ML) 

receiver, 163, 623-625, 
Maximum likelihood sequence 
detection (MLSD), 
623-625, 

Maximum ratio combining, 852 
performance of, 85 1-855 
McEliece-Rodemich-Rumsey- 
Welch (MRRW) bound, 

443 

MDS (maximum-distance 

separable) codes, 440 
Mean-square error (MSE) 
criterion, 645-655 
Meggit decoder, 460 
Memoryless channel, 355 
Memoryless modulation, 95 
Memoryless source, 33 1 
Mercer’s theorem, 77 
Message error probability, 164 
PSK, 194 
QPSK, 193 

Message polynomial, 449 
Metric 

correlation, 173 
distance, 173 
modified distance, 173 
MGF (moment generating 
function), 44 

Microwave LOS channel, 8 
MIMO channels, 966 
capacity of, 982-984, 990-991 
ergodic, 985-986 
outage, 987-990 
coding for, 1001-1021 
bit-interleaved, 1003-1006 
space-time codes, 1006-1021 
temporal, 1003-1006 
slow fading, 968-969, 975-979 
spread spectrum signals for, 
992-996 

MIMO systems, 966 
detectors for, 970-974 
diversity gain for, 996-997 
error rate performance, 

971-973 

lattice reduction for, 973-974 
multicode, 997-1000 
multiplexing gain for, 996-997 
outage probability, 987-988 
scrambling sequence for, 997 


singular- value decomposition 
for, 974-975 

spread spectrum, 992-996 
Minimal polynomial, 408 
Minimum distance, 414 
Minimum distance detector, 171 
Minimum distance of a 
constellation, 185 
Minimum distance of a lattice, 232 
Minimum weight, 414 
Minimum-shift keying (MSK), 
123-124 

power spectrum of, 144 
ML {see maximum-likelihood) 
MLSD, 623-625, 

Modified Bessel function, 47, 213 
Modified distance metric, 173 
Modified duobinary signal, 610 
Modulation 
binary, 2 

comparison of, 226-229 
constraint length, 96 
continuous-phase FSK 
(CPFSK), 116-118 
power spectrum, 138-145 
continuous-phase modulation 
(CPM), 118-123 
digital, 95 
DPSK, 221-223 
equicorrelated (simplex), 
112-113, 209-210 
frequency-shift keying (FSK), 
109-110, 205,215-216 
linear, 110 

M-ary orthogonal, 108-111, 
204-207, 216-219 
memoryless, 95 
multichannel, 737-743 
multidimensional, 108-113 
NRZ, 115 
NRZI, 115 
nonlinear, 110 
OFDM, 746-752 
offset QPSK, 
phase-shift keying (PSK), 
101-103, 191-195 
pulse amplitude (PAM, ASK), 
98-101, 188-190 
quadrature amplitude (QAM), 
103-107, 185-187, 
196-200 

with memory, 95-96 
Modulator, 2, 24 
binary, 2 
digital, 95 
linear, 110 
M- ary, 2 
memoryless, 95 
nonlinear, 110 
pulse amplitude, 98-101 
quadrature amplitude, 103-107 
with memory, 95-96 
Moment generating function {see 
MGF) 

Monic polynomial, 405 
Moore-Penrose pseudoinverse, 
1088 

Morse code, 12, 339 


MRRW (McEliece-Rodemich- 
Rumsey- Welch) bound, 

443 

MSK, 123-124, 144 
Multicarrier communications, 
743-759 

capacity of, 744-745 
channel coding consideration, 
759 

FFT-based system, 749-752 
Filtered multitone (FMT), 754 
OFDM, 746-742 
bit allocation, 754-757 
power allocation, 754-757 
peak-to-average ratio, 757-759 
spectral characteristics, 752-754 
Multichannel communications, 
737-743 

noncoherent combining 
loss, 741 

with binary signals, 739-741 
with M-ary orthogonal signals, 
741-743 

Multicode MIMO systems, 
997-1000 

Multidimensional signaling, 

108 

Multipath channels, 8, 831 
Multipath intensity profile, 

834 

Multipath spread, 834 
Multiple access methods, 
1029-1031 

capacity of, 1031-1035 
CDMA, 1033-1034 
FDMA, 1031-1032 
random accesss, 1068-1077 
TDMA, 1032-1033 
Multiple antenna systems, 
966-1021 

inverse channel detector, 

970 

maximum-likelihood detector, 
970 

minimum MSE detector, 970 
space-time codes for, 1006-1021 
concatenated codes, 
1020-1021 

differential STBC, 1014 
orthogonal STBC, 1011-1013 
quasi-orthogonal STBC, 1013 
trellis codes, 1016-1019 
turbo codes, 1020-1021 
Multiplexing gain, 996-997 
Multiplicity of a lattice, 232 
Multistage interference 

cancellation, 1043-1049 
Multiuser communications, 

1028 

multiple access, 1029-1034 
multiuser detection, 

1029-1034 

random access, 1068-1077 
Multiuser detection, 1034 
decorrelating detector, 
1043-1045 

for asynchronous transmission, 
1039-1042 


for broadcast channels, 
1053-1068 

for CDMA, 1036-1053 
for random access, 1068-1077 
for synchronous transmission, 
1038-1039 

single user detector, 1042-1043 
Mutual information, 332 

Nakagami random variable, 

52, 841 

Narrowband interference, 791-796 
Narrowband process, 79 
Narrowband signal, 18-21 
Nat, 333 

Nearest neighbor detector, 171 
Negative spectrum, 20 
Noise, 

Gaussian, 10 
thermal, 3, 69 
white, 90 

Noise equivalent bandwidth, 92 
Noisy channel coding theorem, 361 
Non-central x 2 random 
variable, 46 

Noncoherent combining loss, 74 1 
Noncoherent detection, 210-226 
error probability for orthogonal 
signals, 216-218 
FSK, 215-216 
Nonlinear distortion, 600 
Nonlinear modulation, 110 
Norm 

of a matrix, 1088 
of a signal, 30 
of a vector, 28 
Normal equations, 716 
Normal random variable, 41 
NRZ, 115 
NRZI, 115 

Nyquist criterion, 604-605 
Nyquistrate, 13 

OFDM, 746-752, 844-890 
bit and power allocation, 
754-757 

degradation due to Doppler 
spreading, 884-889 
FFT implementation, 749-752 
ICI suppression in, 889-890 
peak-to-average ratio, 757-759 
Offset QPSK (OQPSK), 124-128 
On-off keying (OOK), 267, 949 
Optimal detection 
after modulation, 202 
binary antipodal signaling, 173 
binary orthogonal signaling, 176 
biorthogonal signaling, 207 
simplex signaling, 209 
OQPSK, 124-128 
Order of a field element, 407 
Orthogonal matrix, 23 1 
Orthogonal signaling, 108 

achieving channel capacity, 367 
error probability, 205 

with noncoherent detection, 
216-218 

Orthogonal signals, 26, 30 
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Orthogonal vectors, 28 
Orthogonality principle, 646 
mean-square estimation, 

646 

Orthonormal 
vectors, 28 
basis, 28 
signal set, 30 

Outage capacity, 900, 907, 913 
of MIMO channels, 987-990 
Outage probability, 

of MIMO channels, 987-988 
Outer code, 

Pairwise error probability (PEP), 
184, 372,514, 922, 
1014-1016 

Chernov bound, 373, 1014—1016 
PAM, 98-101 
Parallel contatenated block 
codes, 48 1 

Parallel concatenated convolutional 
codes (PCCC), 548 
Parity check bits, 412 
Parity check matrix, 412 
Parity check polynomial, 450 
Partial-band interference, 804 
Partial response signals, 609-61 1 
duobinary, 610 
error probability of, 617-618 
modified duobinary, 610 
precoding for, 613 
Partial-time (pulsed), 784 
Path memory truncation, 246 
PCBC (parallel concatenated block 
codes), 481 

PCCC (parallel concatenated 

convolutional codes), 548 
Peak distortion criterion, 641-645 
Peak frequency deviation, 117 
Peak- to- average ratio, 757-759 
PEP ( see pairwise error 
probability) 

Perfect codes, 434, 442 
Phase of a signal, 23 
Phase jitter, 600 
Phase-locked loop (PLL), 

298-315 
Costas, 312-313 
decision-directed, 303, 308 
loop damping factor, 299 
M-law type, 313-314 
natural frequency, 299 
non-decision-directed, 308-315 
square-law type, 310-312 
Phase tree, 120 
Phase trellis, 120 
Phase-shift keying (PSK), 

101-103 

Pilot signal, 1098 
Plotkin bound, 442 
PN sequences, 463, 796-801 
Polynomial 
irreducible, 405 
minimal, 408 
monic, 405 
prime, 405 
syndrome, 458 


Positive spectrum, 20 
Power efficiency, 226 
Power spectral density, 67 
continuous component, 133 
CPFSK, 138-145 
discrete component, 133 
for in-phase component, 80 
for lowpass process, 8 1 
for quadrature component, 80 
linearly modulated signals, 133 
Power spectrum, 67 
Pre-envelope, 21 
Precoding 

for broadcast channels, 
1053-1068 
dirty paper, 1054 
linear, 1055-1058 
nonlinear, 1058-1068 
QR decomposition, 
1058-1062 
vector, 1062-1065 
via lattice reduction, 
1065-1068 

for spectral shaping, 133-135, 
611-612 

Prediction ( see Linear 
prediction), 

Preferred sequences, 799 
Prefix condition, 340 
Preprocessing, 166 
Prime polynomial, 405 
Primitive BCH codes, 463 
Primitive element, 407 
Probability distributions 
binomial, 41 
chi-square, 
central, 45^46 
noncentral, 46—48 
gamma, 46 
Gaussian, 41-45 
log normal, 54 
multivariate Gaussian, 54-56 
Nakagami, 52-53 
Rayleigh, 48-50 
Rice, 50-52 
uniform, 4 1 

Processing gain, 773-774 
Probability transition matrix of a 
channel, 357 
Product codes, 477 
Product distance, 925 
Prolate spheroidal wave 
functions, 227 

Proper random processes, 71 
Proper random vectors, 65 
PSD (power spectral density), 67 
Pseudo-noise (PN) sequences, 
796-801 

autocorrelation function, 798 
generation via shift 
register, 797 
Gold, 799 
Kasami, 799 
maximal-length, 797 
peak cross-correlation, 799 
preferred, 799 
( see also Spread spectrum 
signals), 


Pseudocovariance 
for complex random 
processes, 7 1 
PSK, 101-103, 191-195 
bit error probability, 195 
Differential (DPSK), 221 
differentially encoded, 195 
message error probability, 194 
Pulse amplitude modulation 
(see PAM) 

Pulsed interference, 784 

effect on error rate performance, 
785-791 

Punctured codes, 446, 516, 
521-523 

Punctured convolutional codes, 
516, 521-523 
rate compatible, 523-525 
Puncturing matrix, 520, 522 
Pythagorian relation, 29 

(2-function, 41 

QAM, 103-107, 185-187, 

196-200 

error probability, 196-200 
QPSK, 102 
error probability, 199 
message error probability, 193 
offset (OQPSK), 124 
Quadrature amplitude modulation 
(see QAM) 

Quadrature component, 22 
Quasi-perfect codes, 435 
Quaternary PSK (QPSK), 102 

R 0 (channel cutoff rate), 527, 
787-791, 957-960 
For fading channels, 957-960 
Raised cosine spectrum, 607 
excess bandwidth, 607 
rolloff parameter, 607 
RAKE demodulator, 869-882 
for binary antipodal signals, 878 
for binary orthogonal signals, 
874-877 

for DPSK signals, 878 
for noncoherent detection of 
orthogonal signals, 879 
generalized, 880-882 
Random access, 1068-1077 
ALOHA, 1069-1073 
carrier sense, 1073-1077 
with collision detection, 1073 
non persistent, 1074 
1-persistent, 1074 
p-persistent, 1074-1077 
offered channel traffic, 1070 
slotted ALOHA, 1070 
throughput, 1070 
unslotted, 1070 
Random coding, 362, 375 
Random processes, 66-81 
bandlimited, 74-76 
bandpass, 78-81 
cross spectral density, 67 
cyclostationary, 70 
discrete-time, 69 
Gaussian, 68 


jointly wide-sense 
stationary, 67 
narrowband, 79 
power, 68 

power spectral density, 67 
power spectrum, 67 
proper, 7 1 

sampling theorem, 74 
series expansion, 74 
white, 69 

wide-sense stationary, 67 
Random variables, 40-57 
Bernoulli, 40 
binomial, 41 

characteristic function, 44 

X 2 ,45 

complex, 63 

exponential, 46 

gamma, 46 

Gaussian, 41 

iid, 45 

jointly Gaussian, 54 
lognormal, 54 
moment generating 
function, 44 
Nakagami, 52 
non-central x 2 , 46 
normal, 41 
Rayleigh, 48 
Ricean, 50 
uniform, 41 
Random vectors, 
circular, 66 

circularly symmetric, 66 
complex, 64 
proper, 65 
Rate 
bit, 97 
code, 2, 402 
signaling, 97 

Rate-compatible punctured 
convolutional codes 
(RCPCC), 523-525 
Rate-distortion function, 350 
Shannon’s lower bound, 353 
Rate-distortion theorem, 351 
Rayleigh fading channel, 833, 841, 
846-868 

CSI at both sides, 912 
CSI at receiver, 909, 957-960 
ergodic capacity, 907 
for MIMO channels, 985-987 
no CSI, 908 
outage capacity, 913 
for MIMO channels, 987-990 
Rayleigh random variable, 48 
RCC (recursive convolutional 
codes), 507 

RCPCC (rate-compatible 

punctured convolutional 
codes), 523-525 
Receiver 

correlation, 177 
MAP, 162 

matched filter, 178-182 
ML, 163, 623-625 
Receiver implementation, 177 
Reciprocal polynomial, 450 
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Recursive convolutional codes, 
Recursive least squares (RLS) 
algorithms, 710-721 
fast RLS, 715 
RLS Kalman, 711-714 
RLS lattice, 716-721 
Recursive systematic convolutional 
codes (RSCC), 507 
Reed-Muller codes, 42 1 
Reed-Solomon codes, 441, 446, 
471-475 

burst error correction, 473 
decoding, 473 
MDS property, 472 
weight enumeration polynomial, 
473 

References, 1109 

Regenerative repeaters, 260-261 

Reliability function, 369 

Reliable communication, 207, 361 

Residuals, 7 1 8 

Rice factor, 5 1 

Ricean fading channel, 833, 

Ricean random variable, 50-52 
RS codes ( see Reed-Solomon 
codes) 

RSCC ( see recursive systematic 
convolutional codes) 

Sampling theorem, 74 
Scattering function, 837 
SCBC ( see serially concatenated 
block codes) 

Schlafli lattice, 234 
Scrambling sequence, 997 
Sequential decoding, 525-528 
Serially concatenated block 
codes, 480 

Set partitioning labeling, 

572-573, 939 
Shannon 

first theorem, 336 
lower bound on R(D), 353 
second theorem, 361 
third theorem, 35 1 
Shannon limit, 207, 554, 570 
Shaping, 586 
Shaping gain, 240, 586 
Shortened codes, 445 
Shortened cyclic codes, 452 
Signal (see also Signals) 
analytic, 21 
bandpass, 21 
bandwidth, 20 
baseband, 20 
complex envelope, 22 
energy of, 25 
envelope of, 23 
fading, 8 

in-phase component, 22 
lowpass, 20 
lowpass equivalent, 22 
multipath, 8, 831 
narrowband, 18-21 
norm, 30 

parameter estimation, 

290-326 
phase, 23 


quadrature components of, 22 
spectrum, 19 
Signal design, 602-611, 

619-623 

for band-limited channel, 602 
for channels with distortion, 
619-623 

for no intersymbol interference, 
604-609 

with partial response pulses, 
609-611 

with raised cosine spectral pulse, 
607-608 

Signal constellation, 28 
Signal space diversity, 928 
Signal space representation, 34 
Signal-to-noise ratio (SNR), 

176, 192 
Signaling 

based on binary codes, 113 
binary antipodal, 101 
biorthogonal, 111 
digital, 95 

multidimensional, 108 
non-return-to-zero (NRZ), 115 
non-return-to-zero, inverted 
(NRZI), 115 
on-off, 267 
orthogonal, 108 
simplex, 112 
with memory, 1 14 
Signaling interval, 96 
Signaling rate, 97 
Signals 

antipodal, 101 
binary coded, 113 
binary orthogonal, 176-177 
biorthogonal, 111 
digitally modulated, 95 
cyclostationary, 70-71, 131 
representation of, 28, 95 
spectral characteristics, 131 
inner product, 26 
M - ary orthogonal, 108-1 1 1 
multiamplitude, 98 
multidimensional, 108-114 
multiphase, 101-103 
orthogonal, 30 
random, 66-8 1 
autocorrelation, 67 
bandpass stationary, 78-81 
cross correlation of, 67 
power density spectrum, 67 
properties of quadrature 
components, 79-81 
white noise, 69 

quadrature amplitude modulated 
(QAM), 103-106 
simplex, 112-113 
Signature sequence, 1037 
Simplex signaling, 1 12-1 13 
optimal detection, 209-210 
Single-sideband (SSB) PAM, 100 
Singleton bound, 440 
Singular-value decomposition, 
974-975, 981-982, 1087 
left singular vectors, 98 1 , 

1087 


right singular vectors, 

981, 1087 

singular values, 974, 

981, 1087 

SISO (soft-input-soft-output) 
decoder, 545 

Skew-Hermitian matrix, 65 
Skin depth, 9 
SNR, 176 
Per bit, 176 
per symbol, 192 
Soft decision decoding, 424 
Source 330-354 
analog, 330 
binary, 33 1 
discrete memoryless 
(DMS), 332 
discrete stationary, 337 
encoding, 339-354 
discrete memoryless, 339 
Huffman, 342-346 
Lempel-Ziv, 346-348 
Source coding, 1, 339-354 
Space-time codes, 1006-1021 
concatenated, 1020-1021 
differential STBC, 1014 
orthogonal STBC, 1011-1013 
quasi-orthogonal STBC, 1013 
trellis, 1016-1019 
turbo, 1020-1021 
Spaced-frequency, spaced-time 
correlation function, 835 
Spatial rate, 1007 
Spectral bit rate, 226 
Spectral shaping 
by precoding, 134, 611-612 
Spectrum 

of CPFSK and CPM, 138-147 
of digital signals, 131-148 
of linear modulation, 133-135 
of signals with memory, 
131-133, 135-147 
Specular component, 84 1 
Sphere packing, 235 
Sphere packing bound, 441 
Spread factor, 845 
table of, 845 

Spread spectrum multiple access 
(SSMA), 1031 
Spread spectrum signals, 

763-765 

acquisition of, 816 
for code division multiple access 
(CDMA), 779-780, 
813-814 

for MIMO systems, 992-996 
concatenated codes for, 776-778 
direct sequence, 765-768 
application of, 778-784 
coding for, 776-778 
demodulation of, 767-768 
performance of, 768-773 
with pulse interference, 
784-791 

excision of narrowband 
interference, 791-796 
for low-probability of intercept 
(LPI), 778-779 


for multipath channels, 

869-871, 997-1000 
frequency-hopped (FH), 
802-804 

block hopping, 803 
performance of, 804-806 
with partial-band interference, 
806-812 

hybrid combinations, 814-815 
interference margin, 774 
processing gain, 773-774 
synchronization of, 815-822 
time-hopped (TH), 814 
tracking of, 819-822 
uncoded DS, 775 
Spread spectrum system model, 
763-765 

Square-law detection, 216 
Square-root factorization, 715 
SQPSK, 124-128 
SSB, 100 

Staggered QPSK (SQPSK), 
124-128 

Standard array, 430 
State diagram, 496 
Stationary random processes, 
wide-sense, 67 
Stationary source, 337 
Steepest-descent (gradient) 
algorithm, 691-701 
Storage channel, 9 
Subfield, 483 
Sublattice, 234 
Subscriber local loop, 756 
Successive interference 

cancellation, 1047-1048 
Sufficient statistics, 166 
Sum-Product algorithm, 558-567 
Survivor path, 244, 512 
SVD (See Singular-value 
decomposition) 

Symbol error probability, 164 
Symbol rate, 97 
Symbol SNR, 192 
Symmetric channel capacity, 363 
Synchronization 
carrier, 290-315 
effect of noise, 300-303 
for multiphase signals, 
313-314 

with Costas loop, 312-315 
with decision-feedback loop, 
303-308 

with phase-locked loop 
(PLL), 298-303 
with squaring loop, 310-312 
of spread spectrum signals, 
815-822 

with tau-dither loop, 820 
with delay-locked loop, 819 
sequential search, 818 
sliding correlator, 816 
symbol, 290-291,315,321 
Syndrome, 430, 467 
polynomial, 458 
Systematic block codes, 412 
Systematic convolutional codes, 
Systematic cyclic codes, 453 
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Tail probability bounds 56-63 
Chernov bound, 58-63, 866-868 
Markov bound, 56, 57 
Tanner graph 558-561 

for low density parity check 
codes, 569-570 
TATS (tactical transmission 
system), 813 
Telegraphy, 12 

Telephone channels, 598-601 
Ternary Golay code, 442 
Theorem 

central limit, 63 
dimensionality, 227 
lossless source coding, 336 
Mercer, 77 

noisy channel coding, 361 
rate-distortion, 351 
Shannon’s second, 361 
Shannon’s third, 351 
Wiener-Khinchin, 67 
Thermal noise, 3, 69 
Threshold decoder, 531 
Time diversity, 85 1 
Time division multiple access 
(TDM A), 1030 
capacity of, 1032-1033 
Timing phase, 315 
Toeplitz matrix, 700 
Tomlinson-Harashima precoding, 
668-669 

Transfer function of convolutional 
codes, 500 


Transform domain generator 
matrix, 495 

Transpose of a matrix, 28 
Tree diagram, 496 
Trellis, 116, 243,496 
Trellis-coded modulation, 
571-589 
encoders for, 583 
for fading channels, 929-935 
free Euclidean distance, 577 
set partitioning, 572 
subset decoding, 578 
tables of coding gains for, 
581-582 

turbo coded, 586-589 
Trellis diagram, 496 
Triangle inequality, 29-30 
Turbo cliff region, 553 
Turbo codes, 548-558 
error floor, 55 1 
EXIT charts, 555 
for fading channels, 
1020-1021 
interleaver gain, 552 
iterative decoding, 552 
Max-Log- APP algorithm, 548 
multiplicity, 549 
turbo cliff region, 553 
waterfall region, 553 
Turbo TCM, 586-589 
Turbo decoding algorithm, 552 
Turbo equalization, 671-673 
Typical sequences, 336 


Underspread fading 
channels, 899 
Underwater acoustic 
channels, 9 
Undetected error, 430 
Unequal error protection, 523 
Uniform interleaver, 480-481 
Uniform random variable, 41 
Union bound, 182-186 
Uniquely decodable source 
coding, 339 

Universal source coding, 347 

Variable-length source 
coding, 339 
Variance, 40 

Varshamov-Gilbert bound, 443 
Vector space, 28-30, 410-41 1 
Vectors 

linearly independent, 29 
norm, 28 
orthogonal, 28 
orthonormal, 28 
Viterbi algorithm, 243-246, 
510-513 

path memory truncation, 
246,513 

survivor, 244-245, 512 
survivor path, 245, 512 
Voltage-controlled oscillator 
(VCO), 298 
Voronoi region 

of a lattice point, 232 


Waterfall region, 553 
Water-filling interpretation, 

745, 902 
in time, 912 

Waveform channels, 358 
WEF (weight enumeration 
function), 415 
Weight distribution, 41 1 
Weight distribution polynomial 
(WEP), 415 
Weight enumeration 
function, 415 

Weight of a codeword, 4 1 1 
Welch bound, 801 
White processes, 69 
Whitened matched filter 
(WMF), 627 

Whitening filter, 167, 627 
Wide-sense stationary 
process, 67 
Wiener-Khinchin 
theorem, 67 

Wireless electromagnetic 
channels, 5 
Wireline channels, 4 
Word error probability, 417 
WSS (side-sense stationary), 67 

Yule- Walker equations, 716 

Z transform, 626 
Zero-forcing equalizer, 642 
Zero-forcing filter, 642 


Digital Communications is an excellent choice for graduate level courses. 
Masoud Salehi of Northeastern University co-authored the book with 
John Proakis and brings a fresh new perspective to this classic text. 

The book was developed to be adaptable to a one semester or two 
semester course. Its comprehensive nature makes it a great reference 
tool for students to keep for their professional careers. Convenient, 
sequential organization begins with a look at the history and classification 
of channel models and builds from there. 

This all-inclusive guide delivers an outstanding introduction to the 
analysis and design of digital communication systems and includes 
expert coverage of new topics such as: 

• Turbo codes and LDPC codes 

• Turbo equalization 

• Multiple antenna systems 

• Iterative decoding 

• Capacity and coding for fading channels 

Features include: 

• End-of-chapter problems 

• Progressive organization 

• Complete and thorough introduction to the analysis 
and design of digital communication systems 

Please refer to the Proakis/Salehi website at www.mhhe.com/proakis 
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